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Preface 


This volume comprises a selection of material based on presentations at the 
Eighth Australian Optimization Day, held in McLaren Vale, South Australia, 
in September 2001, and some additional invited contributions by distin- 
guished colleagues, here and overseas. Optimization Day is an annual mini- 
conference in Australia which dates from 1994. It has been successful in bring- 
ing together Australian researchers in optimization and related areas for the 
sharing of ideas and the facilitation of collaborative work. These meetings 
have also attracted some collaborative researchers from overseas. 

This particular meeting was remarkable in the efforts made by some of 
the participants to ensure being present. It took place within days of the 
September 11 tragedy in New York and the financial collapse of a major 
Australian airline. These events left a number of us without air tickets on the 
eve of the conference. Some participants arrived in South Australia by car, 
having driven up to several thousand kilometers to join the meeting. 

This volume has two parts, one concerning mathematical structure and 
the other applications. The first part begins with a treatment of nondifferen- 
tiability of cone-monotone functions in Banach spaces, showing that whereas 
several regularity properties of cone-monotone functions in finite-dimensional 
spaces carry over to a separable Banach space provided the cone has an in- 
terior, further generalizations are not readily possible. The following chapter 
concerns a comparison between linear and integer programming, particularly 
from a duality perspective. A discrete Farkas lemma is provided and it is 
shown that the existence of a nonnegative integer solution to a linear equa- 
tion can be tested via a linear program. Next, there is a study of connec- 
tions between generalized Lagrangians and generalized penalty functions for 
problems with a single constraint. This is followed by a detailed theoretical 
analysis of convergence of truncates in ¢; optimal feedback control. The treat- 
ment permits consideration of the frequently occurring case of an objective 
function lacking interiority of domain. The optimal control theme continues 
with a study of asymptotic stability of optimal paths in nonconvex prob- 
lems. The purpose of the chapter is to avoid the convexity conditions usually 
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assumed in turnpike theory. The succeeding chapter proposes a unified 
approach to Pontryagin’s principle for optimal control problems with dynam- 
ics described by a partial differential equation. This is followed by a study 
of a turnpike property for discrete-time control systems in metric spaces. 
A treatment of duality theory for nonlinear programming includes compar- 
isons of alternative approaches and discussion of how Mond—Weir duality 
and Wolfe duality may be combined. There are two linked chapters centered 
on the use of probabilistic structure for designing an improved algorithm for 
the determination of the fundamental matrix of a block-structured M/G/1 
Markov chain. The approach via probabilistic structure makes clear in par- 
ticular the nature of the relationship between the cyclic reduction algorithms 
and the Latouche-Ramaswami algorithm in the QBD case. Part I concludes 
with a chapter developing systematic classes of refinements of Hadamard’s 
inequality, a cornerstone of convex analysis. 

Although Part IT of this volume is concerned with applications, a number 
of the chapters also possess appreciable theoretical content. Part II opens 
with the estimation of the sizes of correcting codes via formulation in terms 
of extremal graph problems. Previously developed algorithms are used to gen- 
erate new exact solutions and estimates. The second chapter addresses the 
issue of optimal transforms of random vectors. A new transform is presented 
which has advantages over the Karhunen—Loeve transform. Theory is devel- 
oped and applied to an image reconstruction problem. The following chapter 
considers how to assign service capacity in a queueing network to minimize 
expected delay under a cost constraint. Next there is analysis of a control pol- 
icy for stormwater management in a pair of connected tandem dams, where 
a developed mathematical technology is proposed and exhibited. Questions 
relating to the optimal design of linear consecutive-k-out-of-n systems are 
treated in two related chapters. There is a study of optimizing properties of 
plastics containing wood flour; an analysis of the approximation characteris- 
tics of constrained spanning and Steiner tree problems in weighted undirected 
graphs where edge costs and delays satisfy the triangle inequality; heuristics 
for speeding convergence in line search; and the use of alternative mathemat- 
ical programming formulations for a real-world coal-blending problem under 
different scenarios. 
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Chapter 1 


On the nondifferentiability 
of cone-monotone functions 
in Banach spaces 


Jonathan Borwein and Rafal Goebel 


Abstract In finite-dimensional spaces, cone-monotone functions — a special 
case of which are coordinate-wise nondecreasing functions — possess several 
regularity properties like almost everywhere continuity and differentiability. 
Such facts carry over to a separable Banach space, provided that the cone 
has interior. This chapter shows that further generalizations are not readily 
possible. We display several examples of cone—monotone functions on various 
Banach spaces, lacking the regularity expected from their finite-dimensional 
counterparts. 


Key words: Monotone functions, ordered Banach spaces, generating cones, 
differentiability 


1.1 Introduction 


Functions for which f(y) > f(x) whenever y — x is an element of a given 
convex cone K are called cone monotone with respect to K (or, simply, K- 
monotone). The simplest examples are provided by nondecreasing functions 
on the real line. These have several immediate regularity properties, the most 
intuitive of which may be the at most countable number of discontinuities. 
Regularity properties of coordinate-wise nondecreasing functions on JR”, that 
is, functions f for which f(y) > f(x) whenever y; > x; for i = 1,2,...,n, 
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were first collected by Chabrillac and Crouzeix [5] and include measurability, 
almost everywhere continuity, and almost everywhere Fréchet differentia- 
bility. Note that nondecreasing functions, whether on the real line or JR”, 
are cone monotone with respect to the nonnegative cone, either [0,+00) or 
[0, +00)”. 

Recently, Borwein, Burke and Lewis [2] showed that functions on a sepa- 
rable Banach space, monotone with respect to a convex cone with nonempty 
interior, are differentiable except at points of an appropriately understood 
null set. The main goal of the current chapter is to demonstrate how possible 
extensions of this result, or other generalizations of finite-dimensional results 
on regularity of cone-monotone functions, fail in a general Banach space. 

Motivation for studying coordinate-wise nondecreasing functions in 
Chabrillac and Crouzeix, and cone-monotone functions by Borwein, Burke 
and Lewis, comes in part from the connections of such functions with Lip- 
schitz, and more generally, directionally Lipschitz functions. Interest in the 
latter stems from the work of Burke, Lewis and Overton [4] on approximation 
of the Clarke subdifferential using gradients, an important idea in practical 
optimization. It turns out that such approximations, like in the Lipschitz 
case, are possible in the more general directionally Lipschitz setting. 

Before summarizing the properties of nondecreasing functions in finite di- 
mensions, we illustrate their connection with Lipschitz functions. Consider a 
Lipschitz function 1: IR" + IR and a K > 0 satisfying 


\l(a) —I(y)| < K la — ylloo, for all x,y € IR”. 


Let z € JR” be given by 3% = K, i = 1,2,...,n, and define a function 
f : JR" & IR by f(x) = U(x) + (z, 2). Then for x and y such that y; > «:, 
i=1,2,...,n, we have 


f(y) — (2) = (z,y — x) + Uy) — (a) 
>k (Soe. —a;)—- max (y;— “)) > 0. 


4=1,2,...,n 
i=1 


In effect, f is coordinate-wise nondecreasing. Thus a Lipschitz function de- 
composes into a sum of a linear and a nondecreasing function, and inherits the 
regularity properties of the latter. Borwein, Burke and Lewis show that di- 
rectionally Lipschitz functions on more general spaces admit a similar (local) 
decomposition, into a sum of a linear function and a cone-monotone one (with 
respect to a convex cone with interior). 
Theorem 1 (Monotone functions in finite dimensions). Suppose that 
f IR" & RR satisfies f(x) < f(y) whenever x; < y;, i= 1,2,...,n. Then: 
(a) f is measurable. 
(b) If, for some d with d; > 0 fori =1,2,...,n, the function t > f (ao + td) 
is lower semicontinuous at t = 0, then f is lower semicontinuous at x. 
Similarly for upper semicontinuity. 
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(c) f is almost everywhere continuous. 

(d) If f 1s Gateaux differentiable at xo, then it is Fréchet differentiable at xo. 

(e) Let f be the lower semicontinuous hull of f. Then f is continuous at xo 
if and only if f is. Similarly, f is Géteauax differentiable at xo if and only 
f is, and if these functions are Gdteaua differentiable, their derivatives 
agree. 

(f) f is almost everywhere Fréchet differentiable. 


For details and proofs consult Chabrillac and Crouzeix [5]. Statements (c) 
and (f) generalize the Lebesgue monotone differentiability theorem and in 
fact can be deduced from the two-dimensional version given by S. Saks [9]; 
for details consult Borwein, Burke and Lewis [2]. The Banach space version 
of (c) and (f) of Theorem 1, proved by Borwein, Burke and Lewis, requires a 
notion of a null set in a Banach space. We recall that a Banach space does not 
admit a Haar measure unless it is finite-dimensional, and proceed to make the 
following definitions — for details on these, and other measure-related notions 
we use in this chapter, we refer the reader to Benyamini and Lindenstrauss 


[1]. 

Let X be a separable Banach space. A probability measure 4 on X is 
called Gaussian if for every x* € X*, the measure fy~ on the real line, 
defined by piz»(A) = pf{y | (x*,y) € A}, has a Gaussian distribution. It is 
additionally called nondegenerate if for every x* # 0 the distribution juz« is 
nondegenerate. A Borel set C C X is called Gauss null if u(C) = 0 for every 
nondegenerate Gaussian measure on X. It is known that the set of points 
where a given Lipschitz function f : X + JR is not Gateaux differentiable is 
Gauss null. This in fact holds for functions with values in a space with the 
Radon-Nikodym property (Benyamini and Lindenstrauss [1] Theorem 6.42), 
whereas it fails completely for the stronger notion of Fréchet differentiability. 


Theorem 2 (Borwein, Burke and Lewis). Let X be a separable space 
and let K C X be a conver cone with non-empty interior. If f : X — 
IR is K-monotone, then it is continuous and Hadamard (so also Gdateauz) 
differentiable except at the points of a Gauss null set. 


In what follows, we show that all the assumptions in the above theorem 
are necessary, and, more generally, demonstrate how the properties of cone- 
monotone functions described in Theorem 1 fail to extend to a general Banach 
space. Note that the results of Theorems 1 and 2 hold if the functions are 
allowed to take on infinite values, as long as appropriate meaning is given 
to the continuity or differentiability of such functions. Indeed, composing a 
possibly infinite-valued function with, for example, an inverse tangent does 
not change its monotonicity properties, while leading to finite values. We do 
not address this further, and work with finite-valued functions. Moreover, we 
only work with monotone functions which are in fact nondecreasing (homo- 
tone) and note that nonincreasing (antitone) functions can be treated in a 
symmetric fashion. 
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1.2 Examples 


We begin the section by showing that K-monotonicity of a function f : X 
IR, with X finite- or infinite-dimensional, carries little information about the 
function if the cone K is not generating: kK — K #4 X. (Recall that kK — Kk 
is always a linear subspace of X.) An interesting example of a nongenerating 
cone K is provided by nonnegative and nondecreasing functions in X = 
C[0,1]: K — K is the non-closed set of all functions of bounded variation, 
dense in (but not equal to) X. Note that the example below, as well as 
Example 2, could be considered in a general vector space. 


Example 1 (a nongenerating cone). Suppose that K — kK 4 X. Let LD 
K — K bea hyperplane in X, not necessarily closed, and let | be the (one- 
dimensional) algebraic complement of L in X, that is, D+1= X, LN1=0. 
Such EL and / can be constructed using one of the versions of the Separation 
Principle: for any point Jo not in the intrinsic core (relative algebraic interior) 
of the convex set K — K (as K — K is linear, for any point not in K — Kk) 
there exists a linear functional (not necessarily continuous) ¢ on X such that 
(od, lo) > (¢, K — K); see Holmes [6], page 21. Now let L = ker ¢, | = IR Io. 

Define P;(x) to be the projection of « onto | — the unique point of | such 
that « € P(x) + L. Given any function g:/+ IR, let 


f(x) = g(Pi(a)). 


The function f : X > JR is K-monotone: if y >K 2, that is, y—a € K, then 


P,(x) = P,(y), and in effect, f(y) = f(x). Now note that at any point x € X, 
the function f has the properties of g “in the direction of I.” Consequently, 
in this direction, f may display any desired irregular behavior. 


In light of the example above, in what follows we only discuss generating 
cones. A cone K is certainly generating when int K #4 0), as then the linear 
subspace K — K has interior. More generally, K is generating when the core 
(algebraic interior) of K is not empty. These conditions are met by nonneg- 
ative cones in C[a, 6] or l.. but not by all generating cones — consider for 
example nonnegative cones in co or lp, 1 < p < oo. A condition equivalent 
to K — K = X is that for any x, y € X there exists an upper bound of x 
and y: an element z € X such that z >K x, z >K y. Consequently, nonnega- 
tive cones are generating in Banach lattices, as in such spaces, the maximum 
of two elements (with respect to the nonnegative cone) is defined. Banach 
lattices include Ly, lp, Cla, b] and co spaces. When the subspace Kk — K is 
dense in X, K is generating whenever K — K is additionally closed; equiv- 
alently, when the difference of polar cones to K is closed. Finally, in some 
cases, measure-theoretic arguments lead to conclusions that int(A — Kk) is 
nonempty, under some assumptions on Kk. We take advantage of such argu- 
ments in Example 4. 
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In what follows, we assume that the spaces in question are infinite- 
dimensional. The next example involves a nonnegative cone defined through 
a Hamel basis. 


Example 2 (lack of continuity, general Banach space). Consider an ordered 
Hamel basis in X (every element of X is a finite linear combination of ele- 
ments of such a basis, the latter is necessarily uncountable). Let K be a cone 
of elements, all of which have nonnegative coordinates. This cone is closed, 
convex, generating and has empty interior. Define a function f : X + JR by 
setting f(a) = 1 if the last nonzero coordinate of x is positive, and f(x) = 0 
otherwise. Then f is K-monotone: if y— a € K, then the last coordinates 
where x and y differ must satisfy ya > vq. If the a-th coordinate is the last 
nonzero one for either x or y, then f(y) > f(x). In the opposite case, the last 
nonzero coordinates for x and y agree, and f(x) = f(y). 

In any neighborhood of any point in X, there are points y with f(y) = 0 
as well as points z with f(z) = 1. Indeed, if xq is the last nonzero coordinate 
of x, we can perturb x by assigning a small negative or positive value to 
some xg, with 3 > a. In effect, f is neither lower nor upper semicontinuous. 
However, for any nonzero « € X, the function t > f(x + tx) is continuous at 
t = 0 (multiplying the last nonzero coordinate by 1 + ¢ does not change its 
sign). Moreover, the lower semicontinuous hull of f is the constant function 
i(x) = 0, and the upper semicontinuous hull of f is the constant function 
u(x) = 1. Both of these are smooth. 


The preceding example demonstrated the failure of (c) and (e) of Theorem 
1 in a general Banach space. We note that the cone K in this example is a 
variation on one given by Klee [7], of a pointed convex cone dense in X 
(pointed means that kK M—K = 0). Such a set is obtained by considering all 
elements of X for which the last nonzero coordinate in a given Hamel basis 
is positive. 

In the example below, we use a Schauder basis to construct a cone- 
monotone function violating (c) and (e) of Theorem 1 similarly to Example 
2, however continuous at every point in a dense set of directions. 


Example 8 (lack of continuity but continuity on a dense set of directions). 
Let {xi}?2, be a Schauder basis of X, and let {xj}?2, be the associated 
projections, so that z = )7>°, (x, a*)a; for any « € X. We assume that the 
basis is unconditional, that is, for any x the sum >>, e;(x, x7)x; converges 
for any combinations of e; = +1, and consequently, et (x, Xj.) Bi, converges 
for any subsequence {i;}72,. The standard bases in co and I, with p < +oo 
satisfy this condition. 

Define a cone K C X by K = CO(cone{x;}%,), a closed convex hull of 
the cone generated by x,’s — equivalently, K = {x | (x,x7) > 0 for alli = 
1,2,...}. As the basis is unconditional, any x can be decomposed into a 
sum of an element with positive coordinates and an element with negative 
coordinates, and thus the cone K is generating. Let f : X + JR be given by 


8 J. Borwein and R. Goebel 


f(x) =limsup sign (x, «5)*, 


j-oco 


where at = max{0,a}. Then f is K-monotone. Indeed, if x <K y, that is, 
y—a€ K, then (y — x,x%) > 0 — equivalently (y, x3) > (x, 2%) — for all xj. 
This implies that f(x) < f(y). 

Note that the sets {x | f(x) = 0} and {x | f(a) = 1} are dense in X: 
we have f(a) = 0 for any x € span({z;}92,) whereas f(x) = 1 for any x € 
span({x;}9,) + 072, 27*2;/||z;||. As a result, f(x) is nowhere continuous, 
whereas for any « € X there exists d € X such that f(x + td) is continuous 
at t = 0. In fact, f(w+ta;) is continuous in t, with f(a@+ta;) = f(x) for any t 
and any x;. In greater generality, f(a + y) = f(a) for any y € span({x;}%,), 
as for large enough j, we have (x + y,x}) = (x,x;). Thus there exists a set 
D dense in X such that, for every « € X and every d € D, the function 
tr f(x + td) is continuous. 

As in Example 2 the lower and upper semicontinuous hulls of f are the 
constant functions I(x) = 0 and u(#) = 1, respectively, and in particular, 
they are smooth. 


We now need to introduce another notion of a null set in a separable 
Banach space, more general than that of a Gauss null set, described in the 
comments preceding Theorem 2. A Borel set C C X is called Haar null if 
there is a Borel probability measure 4 on X such that u(C +x) = 0 for all 
x € X. Haar null sets include all Gauss null sets. The nonnegative cone in /?, 
1 < p< ov, is Haar null but not Gauss null (in fact it is not o-directionally 
null, a notion weaker than Gauss null), whereas the nonnegative cone in Cp is 
not Haar null. In the example below, we use a fact that follows from Theorem 
6.4 in Benyamini and Lindenstrauss [1]: if a set S is not Haar null, then S—S 
contains a neighborhood of 0. 


Example 4 (continuity, but only on a dense subset of a separable and non- 
reflexive Banach space). Let X be a separable, nonreflexive space, and 
Y Cc X a hyperplane not containing 0. Let C C Y be a closed convex set, 
with empty interior and not Haar null with respect to Y. In X, consider 
K = IR,C = {rC | r € [0,+co)}, and note that this set is a closed con- 
vex cone: any description of Cas {aw € X | (x,a,) > by, y € I} for ay € X*, 
b, € IR leads to K = {x € X | (x, a,) > by, y € I, by = 0}. Moreover, K has 
empty interior and is not Haar null. Indeed, suppose that (i) = 0 for some 
Borel probability measure on X. Then p’(C) = 0 where yi’ is a Borel prob- 
ability measure on Y defined by p/(A) = u(IR* A), and this contradicts C 
being non Haar null. Also note that kK — kK = X, as Kk — K is a cone, and, 
since K is not Haar null, kK — K is a neighborhood of 0. 
Define a function f : X > JR by 


if«e -K. 
roy {' ee 
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We check that f is K-monotone. The only way this could fail is if for some 
y >K «, f(y) =0and f(x) = 1. But f(y) = 0 states that ye —K, y>K # 
implies c = y—k for some k € K, and since —K is a convex cone, x = 
yt+(—k) K+(—K) = —K. Thus x € —K and f(z) cannot equal 1. Thus 
f is K-monotone. Additionally, as K is closed and convex, the function f is, 
respectively, lower semicontinuous and quasiconvex (level sets {a | f(x) < 
r} are closed and convex). Moreover, f is generically continuous by Fort’s 
theorem (see Borwein, Fitzpatrick and Kenderov [3}]). 

However, for every x € —K, there exists d € X such that t > f(a + 
td) is not continuous at t = 0. Indeed, suppose that this failed for some 
xo € —K. Then for every d € X there exists e(d) > 0 so that |t| < e(d) 
implies f(ao + td) = 0, and so 29 + td € —K. Thus 29 is an absorbing point 
of a closed convex set —K, and, as X is barelled, zo € int(—K). But the 
latter set is empty. 

To sum up, f is continuous on a dense set but it is not continuous (and 
not differentiable) at any point of the given non Haar null set. 


The closed convex set C’ C Y in the example above was chosen to be not 
Haar null and has no interior. Such a set exists in any nonreflexive space, 
and in fact can be chosen to contain a translate of every compact subset of 
the space — see Benyamini and Lindenstrauss [1]. In co, the nonnegative cone 
is such a set (this requires the cone to be not Haar null), whereas the Haar 
null nonnegative cone in J, is not. Still, in /;, and in fact in any nonreflexive 
space, a cone satisfying the mentioned conditions can be found. 

Indeed, suppose the set C’ of Example 4 contains translates of all compact 
subsets of Y. We show that the constructed cone K contains a translate of 
every compact subset of X. Pick any compact D C X. Let g € X* be such 
that Y = g~1(1). Shift D by z, so that mingep+z, g(d) = 1, and moreover, 
so that (D+ 21) OC #9. Pick any v € (D+ 2) NC, and let EF CY be the 
projection of D onto Y in the direction v. Then F is a compact subset of Y, 
and thus for some zg € kerg, E+ z2 C C. Now note that E + z2 is exactly 
the projection in the direction v onto Y of the set D+ z, + z2, which implies 
that the latter set is a subset of C+ IRiv. Now C+ IRiv C K,asC Cc Kk 
and v € C. In effect, K contains D+ z, + 29. 

We now address another question on regularity of cone-monotone func- 
tions. Weak and strong notions of lower semicontinuity for convex functions 
agree. One cannot expect the same to hold for monotone functions, as the 
following example demonstrates. 


Example 5 (Lipschitz continuity, but no weak continuity). Let X = co with 
the supremum norm. The nonnegative cone K is closed, convex, has empty 
interior but is not Haar null. Fix any a € X with a > 0 (a has positive 
coordinates) and define f : X — IR by 


fe) — la" 
eT Me= 241 
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The denominator is never 0, as at least one of the summands is always posi- 
tive, and thus f is continuous. In fact, |||] + ||(a — x)T|| > |la||, and since 
both the numerator and denominator are Lipschitz, so is f. 

Note also that f(X) = [0,1], with f(x) = 0 if and only if x < 0, and 
f(x) = 1 if and only if « > a. We check that f is monotone. For any y > x 
we have yt > xt, and (a—2x)* > (a—y)*, since a— x >a—y. Then 


Ite 
T+ Wao) 0 > Tet Wea 0 > et Me 1 


where the first inequality stems from the fact that for a fixed 6 > 0, the 
function a > a/(a+ 7) is nondecreasing. Thus f is monotone. Let {e, }?2) 
be the standard unit vectors in X. Notice that for any fixed a > 0 and 
large enough n, we have ||(x — aep)t|| = |lat|| and ||(a — x + ae,)*|| = 
max{||(a — x)*||, a}. In effect, 

I|(w@ — aen)* | 

I|(@ — aen)*|| + ||(@— & + aen)*| 

I|* || 


lla || + max{||(a — )*|], a} 


f(a —aen) = 


as n — oo. Note that the last expression is less than f(a) whenever ||x*|| > 0 
and ||(a — x)*|| < a. Similar analysis leads to 


max{||x"||, a} 
max{||x*||,a} + ||(@— x)*||" 


f(a + en) 


with the limit greater than f(x) when ||(a — x)*|| > 0 and ||z*|| < a. Fora 

given a, the vectors ae, converge weakly to 0. The constant a can be fixed 

arbitrarily large, and thus the function f is not weakly lower semicontinuous 

at any x with ||z*|| > 0 (equivalently « ¢ —K), and not weakly upper 

semicontinuous at any x with ||(a — x)*|| > 0 (equivalently 7 ¢a+K). 
Consider any x with x, < 0 for all n. It is easy to verify that 


tim £0 + th) = f(2) 


a rr) ae 


for all h € coo, that is, for sequences h with finitely many nonzero entries (in 
fact, the difference quotient is then 0 for all small enough t). As cog is dense 
in X, and f is Lipschitz continuous, f is Gateaux differentiable at x, with 
the derivative equal to 0 € X*. Similarly, f has Gateaux derivative 0 € X* 
at every x such that x, > dp for all n. 


Theorem 2 of Burke, Borwein and Lewis states that functions monotone 
with respect to a cone with interior are Gateaux differentiable outside a Gauss 
null set. In the example below, we show a failure of that conclusion for a cone 
with empty interior, even in a Hilbert space. 
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We first recall that the nonnegative cone in co is not Haar null, and so 
is not Gaussian null, whereas those in /,, 1 <p < oo are Haar null but not 
Gauss null. To see that the nonnegative cone in /2 is not Gauss null, observe 
for example that it contains the interval 


J={xeEl,|0< a, < 1/8",n =1,2,...} 


and apply the fact that the closed convex hull of any norm-convergent to 0 
sequence with dense span is not Gauss null (Theorem 6.24 in Benyamini and 
Lindenstrauss). The non Gauss null interval J will be used in the example 
below. 


Example 6 (Holder continuity, lack of Gateaux differentiability). We show 
that the Holder continuous function 


f(z) = Vile" IL, 


monotone with respect to the nonnegative cone K, fails to be Gateaux dif- 
ferentiable at any point of —K, both in co with the supremum norm and in 
the Hilbert space lg with the standard /.-norm. 

We discuss the co case first. Pick any « € —K. If x, = 0 for some n, 
then considering x, + te, shows that f is not Gateaux differentiable (the 
directional derivative in the direction of e, is infinite). Suppose that x, < 0 
for all n. Let h be given by hn = (—ap)!/3, and consider t, = 2(—a,)?/? 
converging to 0 as k — oo. We have 


f(le@+th)— f(x) _ fle +th) s Vip t thy 


tr 7 tr = tr 
_ (—a,)1/? _ 1 


~ 2(—ay)2/8 ~ 2(—ay) V6 


and the last expression diverges to +co as k — oo. Thus f is not differentiable 
at x. 

We now turn to lg, and first show that f fails to be Gateaux differentiable 
on the non Gauss null interval 


—J = {x Ely | —1/8" <a, <0,n=1,2,...}. 
Indeed, for any x € —J, consider h with h, = 1/2” and t; = 1/2". Then 


fle + teh) = f(x) . Van F teh . VEL SY 


tr tr 1/2F - 


and, if the function was Gateaux differentiable at x, the directional derivative 
in the direction h would be at least 1. But this is impossible, as x provides a 
global minimum for /. 
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To see that f fails to be Gateaux differentiable at any point x © —K, note 
that for some sequence 7;, we have —1/8' < z,, < 0. An argument as above, 
with h,, = 1/2" and 0 otherwise, leads to the desired conclusion. 


A slight variation on Example 6 leads to a continuous function on Co, 
monotone with respect to the nonnegative cone K, but not Gateaux differ- 
entiable at any point of a dense set coo — K (any point with finite number of 
positive coefficients). Let {k;}92, be dense in AK, and define 


fa) => VE. 


Monotonicity is easy to check, and f is continuous as ||(a — k;)*|| < ||v*|]. If 
for some 7, « <K k;, then f is not Gateaux at x. Indeed, in such a case we 
have, forh >xK Oandt>0, 


f(a + th) — f(a) 5 Vile =k + th) Fl VIl@ — Bi) FI 
t - 2it 


Picking t and h as in Example 6 (for 7 — K;) leads to the desired conclusion. 

The close relationship of cone-monotone and Lipschitz functions suggests 
that badly behaved cone-monotone functions will exist in spaces where irreg- 
ular Lipschitz functions do. For example, 


p(x) = limsup |an 
n—co 

is a nowhere Gateaux differentiable continuous seminorm in /,,, see Phelps 

[8]. Arguments similar to those given by Phelps show that the function f : 

loo — IR, given by 


f(z) =limsup z+ —limsupz7, 
noo noo 

though monotone with respect to the nonnegative cone, is not Gateaux dif- 
ferentiable outside co, that is, at any x for which at least one of limsup 7, 
and lim sup 27 is positive. (Recall a~ = —a when a < 0 and a~ = 0 other- 
wise.) Indeed, suppose that limsup x, = a > 0, and choose a subsequence 
ny, so that limz,, = a. Define h by hn, = 1, hng,, = —1, 7 = 1,2,..., 
and h, = 0 forn #4 nz, k = 1,2,.... Notice that for t close enough to 0, 
lim sup(x + th), = a+ |t| (ift > 0 then (a+ th)n., = Un, + t,i = 1,2,..., 
and these terms provide the limsup(zx + th),*; if t < 0 one should consider 
(@+tR)noiz. = Lnoi4, —¢). On the other hand, limsup(#+th); =limsup 2, 
and in effect, the limit of (f(a + th) — f(x))/t as t — 0 does not exist. The 
case of limsup x, = @ > 0 can be treated in a symmetric fashion. 

Borwein, Burke and Lewis [2] show that any directionally Lipschitz func- 
tion decomposes (locally) into a linear function, and a cone-monotone one 
(with respect to a cone with interior). Consequently, nondifferentiable 
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Lipschitz functions lead to local examples of nonregular cone-monotone func- 
tions. On spaces where there exist nowhere differentiable globally Lipschitz 
functions, like l,, or (1) with I uncountable, one can in fact construct 
nowhere Gateaux cone-monotone functions; we carry this out explicitly in 
our final example. We note that the technique of Example 7 can be used 
to construct cone-monotone functions (with respect to cones with nonempty 
interiors) from any given Lipschitz function, on spaces like co and I,. Also 
note that spaces which admit a nowhere Fréchet differentiable convex func- 
tion (spaces which are not Asplund spaces) also admit a nowhere Fréchet 
renorm (and so a nowhere Fréchet globally Lipschitz function); the situation 
is not well understood for Gateaux differentiability. 


Example 7 (a nowhere Gateaua differentiable function on l,.). As discussed 
above, p(x) = limsup,,_,., |Un| is nowhere Gateaux differentiable on J... We 
use this fact to construct a nowhere Gateaux differentiable function, mono- 
tone with respect to a cone with interior. 

Let e; be the first of the standard unit vectors in /.,, and consider the 
function f(a) = p(x) + (e1,2) = p(x) +21 and the cone K = IR, (By/2(e1)) 
(the cone generated by the closed ball of radius 1/2 centered at e,). Then K 
has interior and f is K-monotone. Indeed, as for any x € IBy/2(e1), x1 > 1/2 
while x, < 1/2, r = 2,3,..., we have, for any k € K, ||k|| = k, (for the 
supremum norm). As p(x) is Lipschitz continuous, with constant 1, we obtain, 
forany re X,k eK, 


p(x + k) — p(x) > —||kl| > —(e1,), 
which translates to 
pia +k) + (e1,0 +k) > p(x) + (e1,2), 


and this means that f is K-monotone. As p is nowhere Gateaux differentiable, 
so is f. 
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Chapter 2 


Duality and a Farkas lemma 
for integer programs 


Jean B. Lasserre 


Abstract We consider the integer program max{c'x| Ar = b,x € N”"}. A 
formal parallel between linear programming and continuous integration, and 
discrete summation, shows that a natural duality for integer programs can 
be derived from the Z-transform and Brion and Vergne’s counting formula. 
Along the same lines, we also provide a discrete Farkas lemma and show that 
the existence of a nonnegative integral solution « € N” to Az = b can be 
tested via a linear program. 


Key words: Integer programming, counting problems, duality 


2.1 Introduction 


In this paper we are interested in a comparison between linear and integer 
programming, and particularly in a duality perspective. So far, and to the 
best of our knowledge, the duality results available for integer programs are 
obtained via the use of subadditive functions as in Wolsey [21], for exam- 
ple, and the smaller class of Chudtal and Gomory functions as in Blair and 
Jeroslow [6], for example (see also Schrijver [19, pp. 346-353]). For more de- 
tails the interested reader is referred to [1, 6, 19, 21] and the many references 
therein. However, as subadditive, Chvatal and Gomory functions are only 
defined implicitly from their properties, the resulting dual problems defined 
in [6] or [21] are conceptual in nature and Gomory functions are used to 
generate valid inequalities for the primal problem. 

We claim that another natural duality for integer programs can be derived 
from the Z-transform (or generating function) associated with the counting 
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version (defined below) of the integer program. Results for counting problems, 
notably by Barvinok [4], Barvinok and Pommersheim [5], Khovanskii and 
Pukhlikov [12], and in particular, Brion and Vergne’s counting formula [7], 
will prove especially useful. 

For this purpose, we will consider the four related problems P, Pg,I and 
Iq displayed in the diagram below, in which the integer program Pg appears 
in the upper right corner. 


Continuous Optimization Discrete Optimization 
= | — 
P:  f(b,c) := max cx Pa:  fa(b,c) := max cx 
t Ax = b — t Ax = b 
xeERt xz € N” 
Integration Summation 
= | = 
I: f(b.) = | etas Ta: falbs) = Dy ee* 
2(b) oe rE Q2(b) 
a Ax = 
§2(b) reR® 92(b) = | 2eN” 


Problem I (in which ds denotes the Lebesgue measure on the affine variety 
{a € R”| Ax = b} that contains the convex polyhedron 2(b)) is the inte- 
gration version of the linear program P, whereas Problem Iq is the counting 
version of the (discrete) integer program Pg. 

Why should these four problems help in analyzing Pg? Because first, P 
and I, as well as Pg and Ig, are simply related, and in the same manner. 
Next, as we will see, the nice and complete duality results available for P,I 
and Ig extend in a natural way to Pg. 


2.1.1 Preliminaries 


In fact, I and Ig are the respective formal analogues in the algebra (+, x) of 
P and Pg in the algebra (®, x), where in the latter, the addition a®b stands 
for max(a, b); indeed, the “max” in P and Pg can be seen as an idempotent 
integral (or Maslov integral) in this algebra (see, for example, Litvinov et al. 
(17]). For a nice parallel between results in probability ((+, x) algebra) and 
optimization ((max,+) algebra), the reader is referred to Bacelli et al. [8, 
Section 9]. 
Moreover, P and I, as well as Pg and Ig, are simply related via 
ef) — lim Fld, ro)"; efalbe) — im fall, ro)", (2.1) 


P—F ORS. 
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Equivalently, by continuity of the logarithm, 


n~ 


1 1. os 
f(b,c) = lim -Inf(b,rc); fa(b,c) = lim —In fa(b,rc), (2.2) 
roo roo 
a relationship that will be useful later. 
Next, concerning duality, the standard Legendre-Fenchel transform which 
yields the usual dual LP of P, 


* re / / 

P* = win, {b'A| A’A > ch, (2.3) 
has a natural analogue for integration, the Laplace transform, and thus the 
inverse Laplace transform problem (that we call I*) is the formal analogue of 
P* and provides a nice duality for integration (although not usually presented 
in these terms). Finally, the Z-transform is the obvious analogue for swmma- 
tion of the Laplace transform for integration. We will see that in the light of 
recent results in counting problems, it is possible to establish a nice duality 
for Iq in the same vein as the duality for (continuous) integration and by 
(2.2), it also provides a powerful tool for analyzing the integer program Pg. 


2.1.2 Summary of content 


(a) We first review the duality principles that are available for P, I and Ig 
and underline the parallels and connections between them. In particular, a 
fundamental difference between the continuous and discrete cases is that in 
the former, the data appear as coefficients of the dual variables whereas in the 
latter, the same data appear as exponents of the dual variables. Consequently, 
the (discrete) Z-transform has many more poles than the Laplace transform. 
Whereas the Laplace transform has only real poles, the Z-transform has ad- 
ditional complex poles associated with each real pole, which induces some 
periodic behavior, a well-known phenomenon in number theory where the 
Z-transform (or generating function) is a standard tool (see, for example, Io- 
sevich [11], Mitrinovic et al. [18]). So, if the procedure of inverting the Laplace 
transform or the Z-transform (that is, solving the dual problems I* and I) 
is basically of the same nature, that is, a complex integral, it is significantly 
more complicated in the discrete case, due to the presence of these additional 
complex poles. 

(b) Then we use results from (a) to analyze the discrete optimization 
problem Pq. Central to the analysis is Brion and Vergne’s inverse formula 
[7] for counting problems. In particular, we provide a closed-form expression 
for the optimal value fy(b,c) which highlights the special role played by the 
so-called reduced costs of the linear program P and the complex poles of the 
Z-transform associated with each basis of the linear program P. We also 
show that each basis B of the linear program P provides exactly det(B) 
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complex dual vectors in C™, the complex (periodic) analogues for Pg of the 
unique dual vector in R™ for P, associated with the basis B. As in linear 
programming (but in a more complicated way), the optimal value f(b, c) of 
Pa can be found by inspection of (certain sums of) reduced costs associated 
with each vertex of §2(b). 

(c) We also provide a discrete Farkas lemma for the existence of nonneg- 
ative integral solutions x € N” to Ax = b. Its form also confirms the special 
role of the Z-transform described earlier. Moreover, it allows us to check 
the existence of a nonnegative integral solution by solving a related linear 
program. 


2.2 Duality for the continuous problems P and I 


With Ae R”™*” and b€ R”™, let 2(b)R” be the convex polyhedron 
Q(b) := {xe R"| Ar = bd; wx> 0}, (2.4) 
and consider the standard linear program (LP) 
P: f(b,c) := max{c'x|Arv = b; x >0} (2.5) 


with c € R”, and its associated integration version 
I: f(b, C= | e°* ds (2.6) 
Q(b) 


where ds is the Lebesgue measure on the affine variety {a € R”| Ax = b} 
that contains the convex polyhedron {2(b). 

For a vector c and a matrix A we denote by c’ and A’ their respective 
transposes. We also use both the notation c’x and (c,x) for the usual scalar 
product of two vectors c and x. We assume that both A € R™*” and b € R™ 
have rational entries. 


2.2.1 Duality for P 


It is well known that the standard duality for (2.5) is obtained from the 
Legendre-Fenchel transform F'(.,c) : R™ — R of the value function f(b, c) 
with respect to b, that is, here (as y+ f(y,c) is concave) 


AK F(A, ¢) a ee (A, y) ~~ fiy,¢), (2.7) 


which yields the usual dual LP problem 


7 i -— = min {b/r| A’ > ch}. ; 
P* — inf, (As) F(A, c) ymin {24| AA 2 e} (2.8) 
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2.2.2 Duality for integration 


Similarly, the analogue for integration of the Fenchel transform is the two- 
sided Laplace transform F'(.,c) : C™ — C of f(b,c), given by 


Aw F(A, c) = | eM) Fy, c) dy. (2.9) 


™m 


It turns out that developing (2.9) yields 
F(A,c) = I oo whenever Re(A’A — c) > 0 (2.10) 
, hel (AD a C)k : 


(see for example [7, p. 798] or [13]). Thus F(A, c) is well-defined provided 


Re(A/A—c) > 0, (2.11) 


n~ 


and f(b,c) can be computed by solving the inverse Laplace transform prob- 
lem, which we call the (integration) dual problem I* of (2.12), that is, 


2 1 +100 2 
I = f(b,¢d:= = / e(>) F(X, 0) dd 
bee 


where 7 € R” is fixed and satisfies A’y—c > 0. Incidentally, observe that the 
domain of definition (2.11) of F(.,c) is precisely the interior of the feasible 
set of the dual problem P* in (2.8). We will comment more on this and the 
link with the logarithmic barrier function for linear programming (see Section 
2.2.5 below). 

We may indeed call I* a dual problem of I as it is defined on the space C’” 
of variables {A;,} associated with the nontrivial constraints Ax = b; notice 
that we also retrieve the standard “ingredients” of the dual optimization 
problem P*, namely b' and A’ — c. 


2.2.8 Comparing P, P* and I, I* 


n~ 


One may compute f(b,c) directly using Cauchy residue techniques. That is, 
one may compute the integral (2.12) by successive one-dimensional complex 
integrals with respect to one variable Ay, at a time (for example starting 
with 1, A2,...) and by repeated application of Cauchy’s Residue Theorem 
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[8]. This is possible because the integrand is a rational fraction, and after 
application of Cauchy’s Residue Theorem at step k with respect to Az, the 
ouput is still a rational fraction of the remaining variables Ax,41,...,Am- For 
more details the reader is referred to Lasserre and Zeron [13]. It is not difficult 
to see that the whole procedure is a summation of partial results, each of them 
corresponding to a (multi-pole) vector \ € R™ that annihilates m terms of 
n products in the denominator of the integrand. 

This is formalized in the nice formula of Brion and Vergne [7, Proposition 
3.3 p. 820] that we describe below. For the interested reader, there are several 
other nice closed-form formulae for f(b, c), notably by Barvinok [4], Barvinok 
and Pommersheim [5], and Khovanskii and Pukhlikov [12]. 


2.2.4 The continuous Brion and Vergne formula 


The material in this section is taken from [7]. To explain the closed-form 
formula of Brion and Vergne we need some notation. 

Write the matrix A ¢ R™*” as A= [Aj|...|A,] where A; € R™ denotes 
the j-th column of A for all 7 =1,...,n. With A := (Aq,..., An) let C(A) C 
R”™ be the closed convex cone generated by A. Let A C Z™ be a lattice. 

A subset o of {1,...,n} is called a basis of A if the sequence {A;}jec is 
a basis of R™, and the set of bases of A is denoted by B(A). For o € B(A) 
let C(o) be the cone generated by {A;}je,. With any y € C(A) associate 
the intersection of all cones C(c) which contain y. This defines a subdivision 
of C(A) into polyhedral cones. The interiors of the maximal cones in this 
subdivision are called chambers in Brion and Vergne [7]. For every y € 4, 
the convex polyhedron (2(y) in (2.4) is simple. Next, for a chamber yy (whose 
closure is denoted by 7), let B(A,y) be the set of bases o such that ¥ is 
contained in C(c), and let (ao) denote the volume of the convex polytope 
{do jeo tyAj | 0 < tj) < 1} (normalized so that vol(R™/A) = 1). Observe that 
for b € ¥ and o € B(A,7) we have b = }/5., 4;(0) Aj for some xj(c) > 0. 
Therefore the vector x(a) € R, with x;(7) = 0 whenever j ¢ ¢, is a vertex of 
the polytope 2(b). In linear programming terminology, the bases o € B(A,7) 
correspond to the feasible bases of the linear program P. Denote by V the 
subspace {x € R” | Az = O}. Finally, given o € B(A), let 77 © R™ be the 
row vector that solves 77.A; = c; for all 7 € o. A vector c € R” is said to 
be regular if c; — 77A; # 0 for all o € B(A) and all j ¢ o. Let c € R” 
be regular with —c in the interior of the dual cone (RN V)* (which is the 
case if A’u > c for some u € R™). Then, with A = Z™, Brion and Vergne’s 
formula [7, Proposition 3.3, p. 820] states that 


. e(e,e(a)) 
=. Ds. Tent (ae 


o€B(A,y) Ms 


VbeE7. (2.13) 
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Notice that in linear programming terminology, cy, — 177 Ax is simply the so- 
called reduced cost of the variable x,, with respect to the basis {Aj} eo. 
Equivalently, we can rewrite (2.13) as 


‘ e(e,0(o)) 
LOR pe | Wo Tiaga =o + AR) 


x(o): vertex of Q(b 


(2.14) 


n~ 


Thus f (b,c) is a weighted summation over the vertices of 2(b) whereas f(b, c) 
is a maximization over the vertices (or a summation with © = max). 

So, if c is replaced by rc and x(a*) denotes the vertex of 2(b) at which 
c'x is maximized, we obtain 


er(c.2(o)—2(0")) 


Flb,ro)/? = ero") x 
x(o):vertex of 2(b) r—™ uo) [[ Ce +77 Ax) 
kgo 


from which it easily follows that 


lim In f(b,rc)‘/" = (c,a(o*)) = max (c,2) = f(b,0), 
roo we Q2(b) 


as indicated in (2.2). 


2.2.5 The logarithmic barrier function 


It is also worth noticing that 


n~ il Yr F100 2 (8,A) 
f(b, re) = i >——— dr 
(Qin)™ Yr—too is 1 
LT (A'A — re)z 
k=1 


with 7, = ry and we can see that (up to the constant (m—n)Inr) the loga- 
rithm of the integrand is simply the well-known logarithmic barrier function 


AH by(d,b) = woh(b, A) — Son (A’A — c);, 
j=l 
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with parameter yz := 1/r, of the dual problem P* (see for example den Hertog 
(9]). This should not come as a surprise as a self-concordant barrier function 
x(a) of acone K C R” is given by the logarithm of the Laplace transform 
Jice eds of its dual cone K* (see for example Giiler [10], Truong and 
Tungel [20]). 

Thus, when r — oo, minimizing the exponential logarithmic barrier func- 
tion on its domain in R™ yields the same result as taking its residues. 


2.2.6 Summary 


The parallel between P, P* and I,I* is summarized below. 


Fenchel-duality Laplace-duality 


f(b, 6) := 


ce flb,c) := e°* ds 


max 
Ax=b; x>0 Az=b; x>0 


F(A,0) a a fly, 0) dy 
1 


[[(42-9o: 
k=1 


P(e) = int, Nu — Fyse)} 


with : A/A—c>0 with : Re(A’A — c) > 0 


_ : Ip ny — 1 - NOB 
f(b,c) = min {\’'b — F(A, ¢)} f(b,e) = (in fn F(X, c) dA 
Sy eet te 1 / ev 
et > — 
ymin {b A|A‘A > c} (in |, dX 


][(4-o2 
k=1 


Simplex algorithm — 
vertices of 2(b) 
— maxc’x over vertices. 


Cauchy’ s Residue — 
poles of F(A, c) 
— Sy e’® over vertices. 


2.3 Duality for the discrete problems Ig and Pg 


In the respective discrete analogues Pg and Ig of (2.5) and (2.6) one replaces 
the positive cone R% by N” (or RZ"), that is, (2.5) becomes the integer 
program 


Pa: fa(b,c) := max{cx|Ar = b; xe N”} (2.15) 
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whereas (2.6) becomes a summation over N” 9 §2(b), that is, 
Ta: fa(b,c) = So {e°*| Az = 6, 2 EN". (2.16) 


We here assume that A € Z™*” and b € Z™, which implies in particular that 
the lattice A:= A(Z") is a sublattice of Z™ (A C Z™). Note that b in (2.15) 
and (2.16) is necessarily in A. 

In this section we are concerned with what we call the “dual” problem Tj 
of Ig, the discrete analogue of the dual I* of I, and its link with the discrete 
optimization problem Pg. 


2.3.1 The Z-transform 


The natural discrete analogue of the Laplace transform is the so-called Z- 
transform. Therefore with fy(b,c) we associate its (two-sided) Z-transform 
Fa(.,c) : C™ — C defined by 


ZR Fi(z,c) = a 2" faly,c), (2.17) 


yEezZm 


where the notation z¥ with y € Z™ stands for z/'---z¥. Applying this 
definition yields 


Fi(z,c) = be z-¥ faly, c) 


l| 
ed 
& 
« 
Me 
is 


yezm xEN™; Arv=y 
, 
— ) ef # ) 21 Y 2 Ym 
cEN” y=Ax 
= S : ee ey (Ae)s —(Ag) 
xcEN” 


“ 1 
ea Ak ee mime) 
all aa Tee (2.18) 
which is well-defined provided 


|zAre ... gAmc| (= |z4*|) > e* Vk=1,...,n. (2.19) 
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Observe that the domain of definition (2.19) of Fy(.,c) is the exponential 
version of (2.11) for F(.,c). Indeed, taking the real part of the logarithm in 
(2.19) yields (2.11). 


2.3.2 The dual problem 15 


Therefore the value fald, c) is obtained by solving the inverse Z-transform 
problem I§ (that we call the dual of Ig) 


1 ex 
ane I | a SF Re 2D) 
Z1/=71 Zm|=Vm 


where e,, is the unit vector of R™ and 7 € R”™ is a (fixed) vector that 
satisfies ofie yfon -..qAme > eck for all k = 1,...,n. We may indeed call ib 
the dual problem of Ig as it is defined on the space Z™ of dual variables z;, 
associated with the nontrivial constraints Ax = b of the primal problem Ig. 


We also have the following parallel. 


falb, a) = 


Continuous Laplace-duality Discrete Z-duality 
f(b, 0) =| e°" ds fal, c) = SS gre 

Ax=b; reR" Ax=b;xEN” 
PO) = fe  Fcddy By(z,0) = So 2 falc) 

at yEezZm 
- 1 - 1 
U Gaze, U are 

with Re(A’A — c) > 0. Witt BOR eer, Reach 


2.3.3 Comparing I* and I5 


Observe that the dual problem Tj in (2.20) is of the same nature as I* in 
(2.12) because both reduce to computing a complex integral whose integrand 
is a rational function. In particular, as I*, the problem I can be solved by 
Cauchy residue techniques (see for example [14]). 

However, there is an important difference between I* and I. Whereas the 
data {Aj,} appears in I* as coefficients of the dual variables \;, in F(, c), 
it now appears as exponents of the dual variables z, in F(z, 0). And an 
immediate consequence of this fact is that the rational function Fi(., c) has 
many more poles than F (.,c) (by considering one variable at a time), and in 
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particular, many of them are complex, whereas F (.,c) has only real poles. As 
a result, the integration of F;(z,c) is more complicated than that of F(,,c), 
which is reflected in the discrete (or periodic) Brion and Vergne formula 
described below. However, we will see that the poles of F(z, c) are simply 
related to those of F(A, C). 


2.3.4 The “discrete” Brion and Vergne formula 


Brion and Vergne [7] consider the generating function H :C™ — C defined 
by 
Aw H(X,c) = bs ca Y, C)E — Osu) 


yeagzm™ 


which, after the change of variable z; = e* for all i = 1,...,m, reduces to 
F(z, c) in (2.20). 

They obtain the nice formula (2.21) below. Namely, and with the same 
notation used in Section 2.2.4, let c € R” be regular with —c in the interior of 
(R7.NV)*, and let y be a chamber. Then for all b € AN7 (recall A = A(Z”)), 


fal, c) = 


cE B(A,y) 


U,(b, €) (2.21) 


for some coefficients U,(b,c) € R, a detailed expression for which can be 
found in [7, Theorem 3.4, p. 821]. In particular, due to the occurrence of 
complex poles in F(z, c), the term U,(b,c) in (2.21) is the periodic analogue 
of (Heo (cx — 1% A,))~* in (2.14). 


Again, as for flo, c), (2.21) can be re-written as 


a ) 
fa(b,c) = S- > Ua(b,¢); (2.22) 


x(o): vertex of (b) 
to compare with (2.14). To be more precise, by inspection of Brion and 
Vergne’s formula in [7, p. 821] in our current context, one may see that 


2imb 
Uo(b,c) = >> mea (2.23) 


gEG(o) 


where G(o) := (®jeoZAj;)*/A* (where * denotes the dual lattice); it is a 
finite abelian group of order (co) and with (finitely many) characters ¢?’"? 
for all b € A. In particular, writing Ay = D0 ,¢, UjrA; for all k ¢ o, 


err Ae (g) = 62! VU jeo UjkII Zo. 
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Moreover, 


Vo(g,¢) = sa (1 Fe ger) ; (2.24) 
kgo 


with A;, 7° as in (2.13) (and 7% rational). Again note the importance of the 
reduced costs c, — 1° Ax in the expression for Fa(z,c). 


2.3.5 The discrete optimization problem Pa 


We are now in a position to see how I§ provides some nice information about 
the optimal value fu(b,c) of the discrete optimization problem Pa. 


Theorem 1. Let A € Z™*",b € Z™ and let c € Z” be regular with —c in the 
interior of (R.AV)*. Letb€ yO A(Z") and let q EN be the least common 


multiple (l.c.m.) of {u(c) }oew(a,y): 
If Ax = b has no solution « € N” then fa(b,c) = —co, else assume that 


1 
max [e'a(o) + lim —InU,(8, ra| ; 
a(o): vertex of 2(b) rsco rT 


is attained at a unique vertex x(o) of Q(b). Then 


1 
fa(b,c) = max [ea(o) + lim -InU,(0, ra| 
x(o): vertex of 2(b) TACO 
1 
— max [ea(o) + —(deg (Ps) — deu(Qon)) 
x(a): vertex of 2(b) q 


(2.25) 


for some real-valued univariate polynomials Psy and Qop. 
Moreover, the term lim;—+ooInU>(b,rc)/r or (deg( Pop) — deg(Qav))/q in 
(2.25) is a sum of certain reduced costs cy, — 17° Ax (with k ¢ o). 


For a proof see Section 2.6.1. 


Remark 1. Of course, (2.25) is not easy to obtain but it shows that the optimal 
value fa(b, c) of Pg is strongly related to the various complex poles of Fi(z, C). 
It is also interesting to note the crucial role played by the reduced costs 
ch — 7° Ax in linear programming. Indeed, from the proof of Theorem 1 the 
optimal value fa(b,c) is the value of c’x at some vertex x(a) plus a sum of 
certain reduced costs (see (2.50) and the form of the coefficients a;(¢,c)). 
Thus, as for the LP problem P, the optimal value fa(b,c) of Pa can be found 
by inspection of (certain sums of) reduced costs associated with each vertex 
of §2(b). 
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We next derive an asymptotic result that relates the respective optimal values 
fa(b,c) and f(b,c) of Pa and P. 


Corollary 1. Let A € Z™*",b € Z™ and let c € R” be regular with —c in 
the interior of (R.NV)*. Let b € yN A and let x* € Q(b) be an optimal 
verter of P, that is, f(b,c) = ca* = c'x(o*) for o* € B(A,7y), the unique 
optimal basis of P. Then fort EN sufficiently large, 


TCO 


fa(tb, c) — f(tb,c) = lim E In Un (tre) F (2.26) 


In particular, fort € N sufficently large, the function t > f(tb,c) — fa(tb, c) 
is periodic (constant) with period t(0*). 


For a proof see Section 2.6.2. Thus, when b € 7/1 A is sufficiently large, say 
b = tho with bo € A and t € N, the “max” in (2.25) is attained at the unique 
optimal basis o* of the LP (2.5) (see details in Section 2.6.2). 

From Remark 1 it also follows that for sufficiently large t € N, the optimal 
value f,z(tb, c) is equal to f(tb, c) plus a certain sum of reduced costs c,—m? Ak 
(with & ¢ o*) with respect to the optimal basis o*. 


2.3.6 A dual comparison of P and Pg 


We now provide an alternative formulation of Brion and Vergne’s discrete 
formula (2.22), which explicitly relates dual variables of P and Pg. Recall 
that a feasible basis of the linear program P is a basis 0 € B(A) for which 
A;'b > 0. Thus let o € B(A) be a feasible basis of the linear program P and 
consider the system of m equations in C™ : 


Zo. gAmi = 6, jo. (2.27) 
Recall that A, is the nonsingular matrix [A,,|---|Aj,,], with j, € o for 


all k = 1,...,m. The above system (2.27) has p(o) (= det(A,)) solutions 
{z(k) teh. written as 


2(k) = ere77*) k= 1,..., p(c) (2.28) 
for p(o) vectors {0(k)} in C™. 
Indeed, writing z = e*e?'"° (that is, the vector {e%e?'™7}™_, in C™) and 
passing to the logarithm in (2.27) yields 
ALX+ Lim ALO = cy, (2.29) 


where cz € R™ is the vector {c;}jeo. Thus A € R™ is the unique solution of 
ALA =, and 0 satisfies 
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Abe Z™. (2.30) 


Equivalently, 9 belongs to (@je¢A;Z)*, the dual lattice of @jeoAjZ. 

Thus there is a one-to-one correspondence between the p(a) solutions 
{9(k)} and the finite group G"(o) = (®jeA;Z)*/Z"™, where G(c) is a sub- 
group of G’(c). Thus, with G(o) = {g1,...,gs} and s := u(o), we can write 
(AL) 19% = 99, = 9(k), so that for every character e?’"Y of G(c), y € A, we 
have 


e2'™Y(g) = crim y' Og yE€ A, ge Go) (2.31) 


and 


erTAG (ga ere od. 9 Eig: (2.32) 


So, for every o € B(A), denote by {zg}geq(c) these (7) solutions of (2.28), 


that is, 
Fy Seke We eG, g€G(o), (2.33) 


with \ = (A4,)~1!c,, and where e* € R”™ is the vector {e*}™,. 

So, in the linear program P we have a dual vector A € R” associated with 
each basis a. In the integer program P, with each (same) basis o there are now 
associated ju(o) “dual” (complex) vectors \+ 2i70,, g € G(o). Hence, with a 
basis o in linear programming, the “dual variables” in integer programming 
are obtained from (a), the corrresponding dual variables 4 € R”™ in linear 
programming, and (b), a periodic correction term 2i76, € C™, g € G(o). 

We next introduce what we call the vertex residue function. 


Definition 1. Let b € A and let c € R” be regular. Let o € B(A) bea 
feasible basis of the linear program P and for every r € N, let {2gr}geq(c) 
be as in (2.33), with rc in lieu of c, that is, 


ese ere Oo g€G(oc), with \=(Al)—1c. 


The vertex residue function associated with a basis o of the linear program 
P is the function R,(z,,.): N— R defined by 


ze 


1 
re Ro(zg,7r) = —~ Ly £ : (2.34) 
Uo) fe) |] -za4te™) 
jgo 


which is well defined because when c is regular, |2,,|4* 4 €"°* for all k Zo. 


The name vertex residue is now clear because in the integration (2.20), 
R,(Zg,7) is to be interpreted as a generalized Cauchy residue, with respect 
to the (a) “poles” {zg,} of the generating function F(z, rc). 

Recall from Corollary 1 that when b € yO A is sufficiently large, say b = tho 
with bo € A and some large t € N, the “max” in (2.25) is attained at the 
unique optimal basis o* of the linear program P. 
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Proposition 1. Let c be regular with —c € (RE NV)* and letbe yO A be 

sufficiently large so that the max in (2.25) is attained at the unique optimal 

basis o* of the linear program P. Let {zg} geG(o~) be as in (2.33) with o = 0%. 
Then the optimal value of Pa satisfies 


£6 ok at 1 ae 
a(b,c) = lim —In | —— y — 
aati ue) geG(o*) ITkgo- (1 = Zar *erer) 
1 
= lim -InRo«(z,1r) (2.35) 
roo r 


and the optimal value of P satisfies 


(6,0) = Jim <m | Pel 
,c) = lim -In : <= 
poo T lo ) g€G(o*) Trgo+ (1 7 Zor| Are *) 
= lim —In Re«((|2,|,7) (2.36) 


For a proof see Section 2.6.3. 

Proposition 1 shows that there is indeed a strong relationship between 
the integer program Pg and its continuous analogue, the linear program 
P. Both optimal values obey exactly the same formula (2.35), but for the 
continuous version, the complex vector z, € C™ is replaced by the vector 
|zy| = ¢* € R™ of its component moduli, where \* € R”™ is the optimal 
solution of the LP dual of P. In summary, when c € R” is regular and 
be yN Ais sufficiently large, we have the following correspondence. 


Linear program P Integer program Pg 
unique optimal basis o* unique optimal basis o* 
1 optimal dual vector u(o*) dual vectors 

A* Ee R™ Zg € C™, g € G(o*) 


Inzg = A* + 276, 


1 1 
fo) = lim, = In. Rex ((25|,7) falb,o) = Jim:— InRe« (25,7) 
ror 


rooor 
2.4 A discrete Farkas lemma 


In this section we are interested in a discrete analogue of the continuous 
Farkas lemma. That is, with A € Z™%*” and b € Z™, consider the issue of the 
existence of a nonnegative integral solution « € N” to the system of linear 
equations Ar =D. 
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The (continuous) Farkas lemma, which states that given A €¢ R™*" and 
be R™, 


{cE R"|Ar=b,r>0} 40 6 [A A>S0 S050, (2.37) 


has no discrete analogue in an explicit form. For instance, the Gomory func- 
tions used in Blair and Jeroslow [6] (see also Schrijver [19, Corollary 23.4b]) 
are implicitly and iteratively defined, and are not directly defined in terms of 
the data A,b. On the other hand, for various characterizations of feasibility 
of the linear diophantine equations Ax = b, where x € Z”, the interested 
reader is referred to Schrijver [19, Section 4]. 

Before proceeding to the general case when A € Z'*”, we first consider 
the case A € N™*”, where A (and thus b) has only nonnegative entries. 


2.4.1 The case when A € N™*” 


In this section we assume that A € N”*” and thus necessarily b € N”™, since 
otherwise {a € N" | Av = b} = 0. 


Theorem 2. Let Ac N™*" and bE N™. Then the following two proposi- 
tions (i) and (ti) are equivalent: 
(i) The linear system Ax = b has a solution x © N”. 


(ii) The real-valued polynomial z 3 z’—-1:= 2 --. 2m _1 can be written 
2-1 = 5° Q;(z)(z4 - 1), (2.38) 

j=l 
for some real-valued polynomials Q; € Rliz1,.-.,2m], 7 = 1,...,n, all 


of which have nonnegative coefficients. 
In addition, the degree of the Q; in (2.88) is bounded by 


b* = yi — min) | Ayr: (2.39) 


For a proof see Section 2.6.4. Hence Theorem 2 reduces the issue of existence 
of a solution x € N” toa particular ideal membership problem, that is, Ax = b 
has a solution x € N” if and only if the polynomial z? — 1 belongs to the 
binomial ideal I = (243 -1)j-1,. xn C R[z1,---, 2m] for some weights Q; with 
nonnegative coefficients. 

Interestingly, consider the ideal J C R[z1,..., 2m; Y1,---;Yn| generated by 
the binomials 243 — Yj,j =1,...,n, and let G be a Grobner basis of J. Using 
the algebraic approach described in Adams and Loustaunau [2, Section 2.8], 
it is known that Ax = 6 has a solution « € N” if and only if the monomial 


jars, 
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z’ can be reduced (with respect to G) to some monomial y®, in which case, 


a € N” is a feasible solution. Observe that in this case, we do not know 
a € N” in advance (we look for it!) to test whether z? — y® € J. One has 
to apply Buchberger’s algorithm to (i) find a reduced Grébner basis G of 
J, and (ii) reduce z° with respect to G and check whether the final result 
is a monomial y®. Moreover, in the latter approach one uses polynomials in 
n+ (primal) variables y and (dual) variables z, in contrast with the (only) 
m dual variables z in Theorem 2. 


Remark 2. (a) With b* as in (2.39) denote by s(b*) := ae the dimension 
of the vector space of polynomials of degree b* in m variables. In view of 
Theorem 2, and given b € N™, checking the existence of a solution « € N” 
to Ax = 6b reduces to checking whether or not there exists a nonnegative 
solution y to a system of linear equations with: 


e nx s(b*) variables, the nonnegative coefficients of the Q,; 


e@ s(be+ max S- Aj) equations to identify the terms of the same powers on 
j=l 

both sides of (2.38). 

This in turn reduces to solving an LP problem with ns(b*) variables and 
s(b* + max, >); Ajx) equality constraints. Observe that in view of (2.38), 
this LP has a matrix of constraints with coefficients made up only of 0’s and 
+1’s. 

(b) From the proof of Theorem 2 in Section 2.6.4, it easily follows that one 
may even constrain the weights Q, in (2.38) to be polynomials in Z[z1,..., 2m] 
(instead of R[z1,...,%m]) with nonnegative coefficients. However, (a) shows 
that the strength of Theorem 2 is precisely allowing Q; € R[z1,..., 2m] while 
enabling us to check feasibility by solving a (continuous) linear program. By 
enforcing Q; € Z[z1,...,%m] one would end up with an integer linear system 
whose size was larger than that of the original problem. 


2.4.2 The general case 


In this section we consider the general case where A € Z™*” so that A may 
have negative entries, and we assume that the convex polyhedron 2 := {a € 
R" | Ax = b} is compact. 

The above arguments cannot be repeated because of the occurrence of 
negative powers. However, let a € N” and @ € N be such that 


n~ 


Ajp = Ajp tox >0, k=1,...,n; 6; = b) +8 > 0, (2.40) 


for all 7 = 1,...,m. Moreover, as 92 is compact, we have that 
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n n 
a. jx; | Ax = b> < ma jx; | Aw = b > =: p*(a) < ov. 
max 23505 x < max 2355 x p*(a) < oo 


(2.41) 


Observe that given a € N”, the scalar p*(a) is easily calculated by solving 
an LP problem. Choose N 5 @ > p*(a), and let A € N™*” and beEN™ be 
as in (2.40). Then the existence of solutions « € N” to Az = b is equivalent 
to the existence of solutions (x, u) € N” x N to the system of linear equations 


Az + uem = b 


j=l 


Indeed, if Ax = b with « € N” then 


n n 
Ag + Om >, O42; - en >, O52; = b+ emf — emP, 


j=l $e 


or equivalently, 
Art B- >= a;2; Em = b, 
j=l 
and thus, as 6 > p*(a) > ae a,x; (see, for example, (2.41)), we see that 
(x,u) with B — ee a,x; =: u € N is a solution of (2.42). Conversely, let 
(x,u) € N" x N be a solution of (2.42). Using the definitions of A and b, it 
then follows immediately that 


n n 
Az + em >_ 0525 + Uem = b+ Bem; S ayay tu = £, 


j=l j=l 
so that Ax = b. The system of linear equations (2.42) can be cast in the form 


Blt ob] _. A| em 
=|4 with B:= |—-— — |, (2.43) 


a’ | 1 


and as B only has entries in N, we are back to the case analyzed in Section 
2.4.1. 


Corollary 2. Let A € Z™*" and b € Z™ and assume that Q := {x € 
R" | Ax = b} is compact. Let a € N” and 6B € N be as in (2.40) with 
B> p*(a) (see, for example, (2.41)). Then the following two propositions (i) 
and (ti) are equivalent: 

(i) The system of linear equations Ax = b has a solution « € N"; 
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(ii) The real-valued polynomial z +> z°(zy)® —1€ Riz,...,2m,y] can be 
written 


(zy)? —1 = Qolz,y)(zy —1) + 95 Qj(z,y)(24(zy)% -— 1) (2.44) 


j=l 


for some real-valued polynomials {Qj}?-9 in R[Z1,.--,2%m,y], all of which 
have nonnegative coefficients. 


The degree of the Q; in (2.44) is bounded by 


m 


(m+ 16+ 5_ bj — min m-+1, Jain (m+ lax + S~ Aje 


j=l j=l 
Proof, Let A € N™",b6 E N™, a € N” and B EN be as in (2.40) with 
3 > p*(a). Then apply Theorem 2 to the equivalent form (2.43) of the system 


Q in (2.42), where B and (b, 3) only have entries in N, and use the definitions 
of A and b. 


Indeed Theorem 2 and Corollary 2 have the flavor of a Farkas lemma as 
it is stated with the transpose A’ of A and involving the dual variables z, 
associated with the constraints Av = 0b. In addition, and as expected, it 
implies the continuous Farkas lemma because if {x € N”| Ax = b} 4 0, then 


from (2.44), and with z:= e* and y := (21°++2m)7}, 


eh _—1 = S>Qy(e™,...e%,e7 Ei) (CAs — 1), (2.45) 
j=l 


Therefore A’\ > 0 = €4)s —1 > 0 for all j=l,...,n, and as the Q; have 
nonnegative coefficients, we have er=1> 0, which in turn implies b’» > 0. 

Equivalently, evaluating the partial derivatives of both sides of (2.45) with 
respect to Aj, at the point A = 0, yields b; = 7y_, Ajr@e for all j =1,...,n, 
with xz := Qz(1,...,1) > 0. Thus Ax = b for some x € R%. 


2.5 Conclusion 


We have proposed what we think is a natural duality framework for the in- 
teger program Pg. It essentially relies on the Z-transform of the associated 
counting problem Ig, for which the important Brion and Vergne inverse for- 
mula appears to be an important tool for analyzing Pg. In particular, it 
shows that the usual reduced costs in linear programming, combined with 
the periodicities phenomena associated with the complex poles of Fy(z,c), 
also play an essential role for analyzing Pg. Moreover, for the standard dual 
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vector \ € R™ associated with each basis B of the linear program P, there 
are det(B) corresponding dual vectors z € C™ for the discrete problem Pa. 
Moreover, for 6 sufficiently large, the optimal value of Pg is a function of 
these dual vectors associated with the optimal basis of the linear program P. 
A topic of further research is to establish an explicit dual optimization prob- 
lem P% in these dual variables. We hope that the above results will stimulate 
further research in this direction. 


2.6 Proofs 


A proof in French of Theorem 1 can be found in Lasserre [15]. The English 
proof in [16] is reproduced below. 


2.6.1 Proof of Theorem 1 


Proof. Use (2.1) and (2.22) to obtain 


1/r 
ere’ x( ) 
efalbc) — ]im Se (so) U,(b, rc) 
x(o): vertex of (b) ak 
l/r 
re x(o) 2Qinb 
= lim , SiG S- Ee a 
Too Oo o\g,7re 
x(o): vertex of (b) 7 geEG(c) es 
1/r 
= lim Se H,(b,re)| (2.46) 


x(o): vertex of (b) 


Next, from the expression of V,(b, c) in (2.24), and with rc in lieu of c, we see 
that V,(g,rc) is a function of y := e”, which in turn implies that H,(b, rc) is 
also a function of €", of the form 


a | 


H(b,re) = (er) > ent) (2.47) 


g€G(c) 2; (45(0,.9, A)  (er)or%9) " 
for finitely many coefficients {6;(0, g,A),a;(¢,c)}. Note that the coefficients 
a;(a,c) are sums of some reduced costs cy — 7° Ax (with k ¢ o). In addition, 
the (complex) coefficients {6; (0, g,A)} do not depend on b. 

Let y := e"/4, where q is the l.c.m. of {H(7) }oeB(A,y). AS a(cr—7° Ax) € Z 
for all k Zo, 


: Poo(y) 
H,(b, re) = y* *™™ x 2.48 
( ) Z Qavly ( ) 


NS 
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for some polynomials P,,,Qc» € Rly]. In view of (2.47), the degree of Psp 
and Qa», which depends on b but not on the magnitude of 6, is uniformly 
bounded in b. 

Therefore, as r — 00, 


Hy (bre) eye aoe Rela), (2.49) 


so that the limit in (2.46), which is given by maxe°"(?) lim U;(b,rc)!/" (as 


T3100 


we have assumed unicity of the maximizer o), is also 


max eo &(7) + (deg (Por )—deg(Qon))/a_ 
x(o): vertex of Q(b) 


Therefore fa(b,c) = —oo if Ax = b has no solution x € N”, else 


1 
fa(b,c) = max c'x(o) + —(deg(Pov) — deg(Qon))|, (2-50) 
x(o): vertex of 2(b) q 


from which (2.25) follows easily. 


2.6.2 Proof of Corollary 1 


Proof. Let t € N and note that f(tb, rc) = trf(b,c) = trc'x* = trc'x(o*). As 
in the proof of Theorem 1, and with tb in lieu of b, we have 


t 


U, (tb, rc) 
iu(o) 


rc'ax(c) 


rm 1 ote’x* Ug« (tb, rc) E 
fa(tb, rc) =E€E ~ (o*) + x erca(o*) 


vertex 2(a)#x* 


and from (2.47)—(2.48), setting 6, := ¢a* — c¢x(o) > 0 and y:= "4, 


Lge 
faltb re)/" = gte'a* Ua+(tb, re) mal s —tqdo Poto(y) 


n(o*) ately) 


vertex x(o0)Aau 


Observe that c'x(o*)—c'a(c) > 0 whenever o 4 o* because (2(y) is simple 
if y € y, and cis regular. Indeed, as x* is an optimal vertex of the LP problem 
P, the reduced costs c, — 77 Ay (k ¢ o*) with respect to the optimal basis 
o* are all nonpositive, and in fact, strictly negative because c is regular (see 
Section 2.2.4). Therefore the term 


—tqdo Poto(y) 
dy Y Qots(4) 


vertex x(a)#x* 
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is negligible for t sufficiently large, when compared with U,«(tb,rc). This is 
because the degrees of P51, and Qo1y depend on tb but not on the magnitude 
of tb (see (2.47)—(2.48)), and they are uniformly bounded in tb. Hence taking 
the limit as r — co yields 


efaltc) — lim 


TCO T— Co 


ejne 1/r 
rtc’x(o*) ‘ A 
OES 5 i (tb, rc) = e'2(0") lim U,+(tb,rc)!/", 
L(o*) 
from which (2.26) follows easily. 
Finally, the periodicity comes from the term ¢?'*®(g) in (2.23) for g € 
G(o*). The period is then, of the order G(o*). 


2.6.38 Proof of Proposition 3.1 


Proof. Let U,«(b,c) be as in (2.23)—(2.24). It follows immediately that 77° = 
(A*)/ and so 


—1n?" Ap. 2imAg ( 


es 9) = ea Abd" p—2it Ai, 8g - ye, ge Cet 


Next, using c’r(o*) = 0'd*, 


g0' 2(0") <2imd( g) = ob'd* 21m’, _ ZF g € G(o*). 
Therefore 
lo 1 yy % 
c'a(o) = gf 
E Ug« (b, c) — = 
(o*) Lo) (1 — 2g “*ers) 


I 


Rive (255 Ly, 


and (2.35) follows from (2.25) because, with rc in lieu of c, z, becomes zg, = 
e"" <2'785 (only the modulus changes). 

Next, as only the modulus of z, is involved in (2.36), we have |zg,| = € 
for all g € G(o*), so that 


rr* 


1 eee ig err 


Wee) eee Tigo (1 = Bgel there) Tigo (1 serlen ae)! 


and,asr—->o, 
rb’ d* 
E m~ arb’ r* 


gee) 


because (cp — Aj.A*) <0 for all k ¢ o*. Therefore 
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ee 2a brA* = f(b, c) 
1m n Te => = 3C); 
roo 7 [ied ne en (Ce—AyA )) 


the desired result. 


2.6.4 Proof of Theorem 2 


Proof. (ii) = (i). Assume that z? — 1 can be written as in (2.38) for some 
polynomials {Q;} with nonnegative coefficients {Qja}, that is, {Qj(.)} = 
Vaenm Qjaz* = Voenm Qjazz* ++ 29", for finitely many nonzero (and 
nonnegative) coefficients Qj.. Using the notation of Section 2.3, the function 
fa(b,0), which (as c = 0) counts the nonnegative integral solutions « € N” 
to the equation Ax = b, is given by 


Ces fl i, ee 
d(0,0) = 7 mes Wt 77 Ay OF 
(271) Jilan paleo la = ZA) 
where y € R™ satisfies A’y > 0 (see (2.18) and (2.20)). 
Writing 2°-¢m as z~¢m(z® —1+1) we obtain 


n~ 


fa(b, 0) = By F Bo, 


with 


Zz em 


1 
oe a 
(277i) Jzsj=y |2m|=Ym Woai= 248) 


1 z—em(z> — 1) 
Dyas s / Pe / SoG Sag 
(27i)™ lzil=71 |2m|=Ym Hj-10 _ oak) 


= 1 zAi—em Q(z) 
= Plo. & NE _ d. 
yy (20a) Veg — Teg; oo) ° 


and 


I 


j=l 
“sy ge/ i «f ne 
j=lacN™ (271) Jil |2m|=Ym ITk2j0 See) 


From (2.20) (with b := 0) we recognize in B, the number of solutions « € N” 
to the linear system Ax = 0, so that B, = 1. Next, again from (2.20) (now 
with b:= A; +a), each term 


; Aj+a-—em 
Cja= sia | | ee ay dz 
(27?) |zal= 71 |Zm|=Ym Tz ae f) 
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is equal to 

Qja X the number of integral solutions « € N"~! 
of the linear system AW) a = A; + a, where A) is the matrix in N™*("-)) 


obtained from A by deleting its j-th column A,;. As by hypothesis each Qja 
is nonnegative, it follows that 


so that falb, 0) = B, + By > 1. In other words, the sytem Ax = b has at least 
one solution « € N”. 
(i) > (ii). Let  € N” be a solution of Ax = b, and write 


yo = pA _ | ite Ae (2422 = 1) he hits pay at Ajx; (zAntn _ 1) 


PAP a(t ST) [a+ 24 $$ 2Ases—0) , jJHly...yn, 
to obtain (2.38) with 
ze Q,(z) = gukai Ante [1 4745 4 eet ue) , jHl,...,n. 
We immediately see that each Q, has all its coefficients nonnegative (and 
even in {0, 1}). 


Finally, the bound on the degree follows immediately from the proof for 


(i) = (ii). 
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Chapter 3 

Some nonlinear Lagrange and penalty 
functions for problems with a single 
constraint 


J. S. Giri and A. M. Rubinovt 


Abstract We study connections between generalized Lagrangians and gen- 
eralized penalty functions, which are formed by increasing positively homo- 
geneous functions. In particular we show that the least exact penalty param- 
eter is equal to the least Lagrange multiplier. We also prove, under some 
natural assumptions, that the natural generalization of a Lagrangian cannot 
improve it. 


Key words: Generalized Lagrangians, generalized penalty functions, single 
constraint, IPR convolutions, IPH functions 


3.1 Introduction 


Consider the following constrained optimization problem P( fo, f1): 
min fo(z) subject tox € X, fi(x) <0, (3.1) 


where X C IR” and fo(x), fi(a) are real-valued, continuous functions. (We 
shall assume that these functions are directionally differentiable in Section 
3.4.) Note that a general mathematical programming problem: 


min fo(x) subject tox € X, g(x) <0, GEL), hj(xz)=0( € J), 
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where J and J are finite sets, can be reformulated as (3.1) with 

= h; . 2 

f(z) = max(max g(1),max [hy (2))) (3.2) 


Note that the function f; defined by (3.2) is directionally differentiable if the 
functions g; (i € I) and h,; (j € J) possess this property. 

The traditional approach to problems of this type has been to employ a 
Lagrange function of the form 


L(x; 2) = fo(a) + Afi (2). 


The function q(A) = infrex(fo(a) + Afi(x)) is called the dual function and 
the problem 
max q(A) subject to A > 0 


is called the dual problem. The equality 
sup inf £(§, A) = inf{{(§) : § € &, {o0(8) <9 
ADO TEX 


is called the zero duality gap property. The number > 0 such that 
inf L(x, A) = inf{fo(x) : «© X, fi(a) < 0} 


is called the Lagrange multiplier. 
Let fi (x) = max(fi(z),0). Then the Lagrange function for the problem 
P(fo, fi’) is called the penalty function for the initial problem P(fo, f1). 
The traditional Lagrange function may be considered to be a linear con- 
volution of the objective and constraint functions. That is, 


L(x; X) = p( fol), Afi(x)), 


where p(u,v) = w+. It has been shown in [4, 5] that for penalty func- 
tions, increasing positively homogeneous (IPH) convolutions provide exact 
penalization for a large class of objective functions. The question thus arises 
“are there nonlinear convolution functions for which Lagrange multipliers 
exist?” The most interesting example of a nonlinear IPH convolution func- 
tion is the function s,(u,v) = (u* +k) "These convolutions also of- 
ten provide a smaller exact penalty parameter than does the traditional 
linear convolution. (See Section 3.3 for the definition of an exact penalty 
parameter.) 

We will show in this chapter that for problems where a Lagrange multiplier 
exists, an exact penalty parameter also exists, and the smallest exact penalty 
parameter is equal to the smallest Lagrange multiplier. 

We also show that whereas a generalized penalty function can often im- 
prove the classical situation (for example, provide exact penalization with 
a smaller parameter than that of the traditional function), this is not true 
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for generalized Lagrange functions. Namely, we prove, under some natural 
assumptions, that among all functions s, the Lagrange multiplier may exist 

only for the k = 1 case. So generalized Lagrangians cannot improve the 
classical situation. 


3.2 Preliminaries 


Let us present some results and definitions which we will make use of later in 
this chapter. We will refer to the solution of the general problem, P(fo, f1), 
as M(fo, fi). 

We will also make use of the sets, Xo = {x € X : fi(x) < 0} and X; = 
{x EX: fi(x) > OF. 

It will be convenient to talk about Increasing Positively Homogeneous 
(IPH) functions. These are defined as functions which are increasing, that is, 
if (6, y) > (0, 7’) then p(d, 7) > p(d’, 7’), and positively homogeneous of the 
first degree, that is, p(a(6,y)) = ap(d,7),a>0. 

We shall consider only continuous IPH functions defined on either the half- 
plane {(u,v) : u> 0} or on the quadrant IR4, = {(u,v) € IR*: u > 0,v > Of. 
In the latter case we consider only IPH functions p : R’. 4 — R, which 
possess the following properties: 


p(1,0) = 1, lim p(1,v) = +00. 


We shall denote by P; the class of all such functions. The simplest example 
of functions from P; is the following function s, (0 < k < +00), defined on 
Ia. ; 

5, (u,v) = (u¥ +0*)*, (3.3) 


21+1 
Ifk= 5 ee j with k,l € N then the function s; is well defined and IPH 


m 
on the half-plane {(u, v) : u > 0}. (Here WV is the set of positive integers.) 


A perturbation function plays an important part in the study of extended 
penalty functions and is defined on Ri = {y € R: y > 0} by 


Bly) = inf{ fo(w) : @ € X, fi(x) < y}- 
We denote by Cx the set of all problems (fo, f1) such that: 


1. infrex fo(x) > 0; 

2. there exists a sequence x, € Xj, such that fi(a,) > 0 and fo(xp) 
M(fo, fi); 

3. there exists a point « © X such that fi(x) < 0; 

4. the perturbation function of the problem (fo, fi) is l.s.c. at the point 
y= 0. 
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An important result which follows from the study of perturbation functions 
is as follows. 


Theorem 1. Let P(fo, fi) € Cx. Let k > 0 and let p = px be defined as 
pr(d,7) = (6% + y*)&. There exists a number d > 0 such that qi (d) = 
M (fo, fi) if and only if B is calm of degree k at the origin. That is, 


lim inf e 
y— +0 y 


A proof of this is presented in [4] and [5]. 


3.3 The relationship between extended penalty 
functions and extended Lagrange functions 


Let (fo, f1) € Cx and let p be an IPH function defined on the half-plane 
IR? = {(u,v) : u > O}. Recall the following definitions. The Lagrange-type 
function with respect to p is defined by 


Lp(a, d) = p( fol), dfi(a)). 


(Here d is a real number and df does not mean the differential of f.) The 
dual function g,(d) with respect to p is defined by 


qp(d) = inf p(fo(x),dfi(a)), d > 0. 


Let pt be the restriction of p to Ina Consider the penalty function i and 
the dual function gq} corresponding to p*: 

Ly (2,4) = p* (fo(z), df (2)), (tw € X, d= 0), 

at (d) = inf LE(@,d), (42 0). 
Note that if fi(x) =0 for x € Xo then q = qf. 


Let 
tp(d) = inf pl fol), dfi(x)). (3.4) 


Then 
dp(d) = min(t,(d), rp+(1,d)), (3.5) 


where r,+ is defined by 


ryt (dosd) = inf p* (do fol), dfi(2)). (3.6) 
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(The function r,+ was introduced and studied in [4] and [5].) 
If the restriction p* of p on R%, belongs to Py then rp+(1,d) = q; (d) (see 
([4, 5]), so 
qp(d) = min(t,(d), gp (d)). 


Note that the function t, is decreasing and 
tp(d) < t)(0) = M(fo, fi), (d> 0). 
The function qf (d) = rp+(1,d) is increasing. It is known (see [4, 5]) that 
a (4) = rpe(1.d) < tim rp(1,u) = M(fo, fi). (3.7) 


Recall that a positive number d is called a Lagrange multiplier of P( fo, fi) 
with respect to p if gp(d) = M(fo, f1). A positive number d is called an exact 
penalty parameter of P(fo, f1) with respect to p*, if gy (d) = M(fo, fi). We 
will now show that the following statement holds. 


Theorem 2. Consider (fo, f1) € Cx and an IPH function p defined on IR2. 
Assume that the restriction p* of p to R’, belongs to P,. Then the following 
assertions are equivalent: 


1) there exists a Lagrange multiplier d of P(fo, fi) with respect to p. 
2) there exists an exact penalty parameter d of P(fo, f1) with respect to pt 
and 


max(tp(d),r,+(d)) = M(fo, fi) for alld > 0. (3.8) 
Proof. 1) => 2). Let d be a Lagrange multiplier of P( fo, f,). Then 
inf p(fo(), dfi(x)) = M(fo, fr). 
Since p is an increasing function and df;‘ (x) > df (2) for all a € X, we have 
af (@) = ink p* (fol). df (2) 
inf p(fo(x), afi (x)) 
> inf p(fo(), dfi(x)) = M(fo, fr). 


I 


On the other hand, due to (3.7) we have qj (d) < M(fo, fi) for all d. Thus 
g; (d) = M(fo, fi), that is, d is an exact penalty parameter of P( fo, fi) with 
respect to p*. 

Due to (3.5) we have 


min(t,(d),rp+(1,4)) = M(fo, fi). 
Since tp(d) < M(fo, fi) and r,+(1,d) < M(fo, fi), it follows that 


tp(d) = M(fo, fr) and rp+(1,d) = M(fo, fi). (3.9) 
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Since t,(d) is decreasing and r,+(1,d) is increasing, (3.9) implies the 
equalities 


tp(d) = M(fo, fi), (0 
Tp+(1,d) = M(fo, fi), (d 


which, in turn, implies (3.8). 


2) Assume now that (3.8) holds. Let 


Ds = {d:tp(d) = M(fo, fi)}, Dr = {di rp+(1,d) = M(fo, fi)}- 


Since p is a continuous function it follows that t, is upper semicontinuous. 
Also M(fo, fi) is the greatest value of t, and this function is decreasing, 
therefore it follows that the set D, is a closed segment with the left end-point 
equal to zero. It should also be noted that the set D,. is nonempty. Indeed, 
since p* € P, it follows that D, contains a penalty parameter of P(fo, fi) 
with respect to p*. It is observed that the function r,+(1,-) is increasing 
and upper semicontinuous and since M(fo, f1) is the greatest value of this 
function, it follows that D, is a closed segment. Due to (3.8) we can say that 
D,UD, = [0,+00). Since both D, and D, are closed segments, we conclude 
that the set D; := D,N D, #0. Let d € D; and therefore ty(d) = M(fo, fi) 


and rp+(1,d) = M(fo, fi). Due to (3.5) we have q,(d) = M(fo, fi). 


Remark 1. Assume that pt € P; and an exact penalty parameter exists. It 
easily follows from the second part of the proof of Proposition 2 that the set 
of Lagrange multipliers coincides with the closed segment D; = D,M D,.. 


Note that for penalty parameters the following assertion (Apen,) holds: 


a number which is greater than an exact penalty parameter is also an exact 
penalty parameter. 


The corresponding assertion, Ajag: 


a number, which is greater than a Lagrange multiplier is also a Lagrange 
multiplier, 


does not hold in general. Assume that a Lagrange multiplier exists. Then 
according to Proposition 2 an exact penalty parameter also exists. It follows 
from Remark 1 that (Aja,) holds if and only if D, = [0, +00), that is, 


inf p(fol), dfi(#)) = M(fo. fr) for all d > 0. (3.10) 


We now point out two cases where (3.10) holds. 


One of them is closely related to penalization. Let p be an arbitrary IPH 
function, such that p(1,0) = 1 and fi(«) = 0 for all « € Xo (in other words, 
fi’ = fi), then (3.10) holds. 
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We now remove condition f;* = f; and consider very special IPH functions, 
for which (3.10) holds without this condition. Namely, we consider a class P, 
of IPH functions defined on the half-plane IR? = {(u,v) : u > 0} such that 
(3.10) holds for each problem (fo, f1) € Cx. 

The class P, consists of functions p : IR? — R, such that the restriction 
of p on the cone IR4. belongs to P; and p(u,v) = wu for (u,v) € IRZ with 
v <0. 

It is clear that each p € P, is positively homogeneous of the first degree. Let 
us now describe some further properties of p. Let (u,v) > (u’,v’). Assuming 
without loss of generality that v > 0,v’ < 0 we have 


p(u,v) = p(u',0) 2 u’ = pu’, v’) 


so p is increasing. Since p(u,0) = u, it follows that p is continuous. Thus 
P,, consists of IPH continuous functions. The simplest example of a function 
pé P, is p(u,v) = max(u, av) with a > 0. Clearly the function 


px(u,v) = max((u* + av*)é, u) 


with k = ott I,m €WN belongs to P, as well. 
Let us check that (3.10) holds for each (fo, f1) € Cx. Indeed, since fo(x) > 


O for all x € X, we have 


int Pl fol2); dfi(x)) = ae fo(x) = M(fo, fi) for all d > 0. 


3.4 Generalized Lagrange functions 


In this section we consider problems P( fo, fi) such that both fo and f; are 
directionally differentiable functions defined on a set X C IR”. Recall that 
a function f defined on X is called directionally differentiable at a point 
x € intX if for each z € IR” there exists the derivative f’(z,z) at the point 
x in the direction z: 


f'(e,2) = lim ~(f(e-+ a2) — f(a). 
a>+0 a 

Usually only directionally differentiable functions with a finite derivative are 
considered. We also accept functions whose directional derivative can attain 
the values too. It is well known (see, for example, [1]) that the maximum of 
two directionally differentiable functions is also directionally differentiable. In 
particular the function f* is directionally differentiable, if f is directionally 
differentiable. Let f(a) = 0. Then 


(f*)'(x, 2) = max(f'(2, z),0) = (f’(a, 2)". 
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Let sz, k > 0 be a function defined on IR4. by (3.3). Assume that there 
exists an exact penalty parameter for a problem P( fo, f1) with (fo, fi) € Cx. 
It easily follows from results in [5, 6] that an exact penalty parameter with 
respect to k’ < k also exists and that the smallest exact penalty parameter 
dy: with respect to sy is smaller than the smallest exact penalty parameter 


dy, with respect to s;. The question then arises, does this property hold for 
2l+1 

Lagrange multipliers? (This question makes sense only for k = an with 
m 

k,l € N and functions s; defined by (3.3) on the half-plane IR2.) We provide 


a proof that the answer to this question is, in general, negative. 


Let f be a directionally differentiable function defined on a set X and let 
x € intX. We say that x is a min-stationary point of f on X if for each 
direction z either f’(z,z) = 0 or f’(x,z) = +00. We now present a simple 
example. 


Example 1. Let X = R, 


pO et fate) = { x ifa <0, 


_ f-ve ife>0, 
fale) = { —a ifa<0. 


Then the point « = 0 is a min-stationary point for f; and fo, but this point 
is not min-stationary for f3. 


Proposition 1. (Necessary condition for a local minimum). Let x € intX 
be a local minimizer of a directionally differentiable function f. Then x is a 
min-stationary point of f. 


Proof. Indeed, for all z € IR” and sufficiently small a > O we have 
(1/a)( f(a + au) — f(x)) > 0. Thus the result follows. 


Consider a problem P(fo, f1) where (fo, f1) € Cx are functions with finite 
directional derivatives. Consider the IPH function s, defined by (3.3). Let us 
define the corresponding Lagrange-type function L,,: 


Ls, (@,A) = fo(x)* + Afi(x)*. (3.11) 
We have for « € X such that fi(x) 4 0 that 
Li, (a, % A) = kfo(x)*"(fo)' (a, 2) + Akfi(x)*"*(fi)'(@, z)- (3.12) 


Assume now that fi(a) = 0. We consider the following cases separately. 


1) k> 1. Then 


Lo, eZ) = kfo(x)*—+ (fo)! (a, z). (3.13) 
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2) k=1. Then 
£,,(&, 253A) = (fo)’(a, 2). (3.14) 


3) k <1. First we calculate the limit 


= lim ~(filz) + af} (x, z) + o(a))* 


= jim, “(af (x, z) + 0(a))*. 


We have 


Hence 
+00 if fi(a, z) > 


Li,(@, 2A) = 4 kfo(x)**(fo)'(x, 2) if fi(x, 2) 


0 
0, (3.15) 
—0oo if fi(x,z) <0 


Note that for problems P( fo, f1) with (fo, fi) € Cx a minimizer is located 
on the boundary of the the set of feasible elements {x : fi(x) < O}. 


Proposition 2. Let k > 1. Let (fo, fi) € Cx. Assume that the functions 
fo and f; have finite directional derivatives at a point & € intX, which is a 
minimizer of the problem P( fo, fi). Assume that 


there exists u € IR” such that (fo)/(Z,u) <0, (3.16) 


(that is, Z is a not a min-stationary point for the function fo over X ). Then 
the point £ is not a min-stationary point of the function Ly for each \ > 0. 


Proof. Assume that % is a min-stationary point of the function L,, (x; A) over 
X. Then combining Proposition 1 and (3.13) we have 


fo(2)*-"(fo)' (Zz) 20, 2 EIR”. 


Since fo(%) > 0 it follows that (fo)/(Z, z) > 0 for all z, which contradicts 
(3.16). 


It follows from this proposition that the Lagrange multiplier with respect 
to Ls, (k > 1) does not exist for a problem P( fo, fi) if (3.16) holds. Condition 
(3.16) means that the constraint f(x) < 0 is essential, that is, a minimum 
under this constraint does not remain a minimum without it. 


Remark 2. Consider a problem P(fo, f1) with (fo, f1) € Cx. Then under 
some mild assumptions there exists a number / > 1 such that the zero duality 
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gap property holds for the problem P(f%, f*) with respect to the classical 
Lagrange function (see [2]). This means that 


su inf k L)+ Xr k x = inf fk ax). 

seu ink (fo ( ) fi ( ) we X:fi(x)<0 o( ) 
Clearly this is equivalent to 

sup inf x),Afi(xz)) = inf Z), 

Ap weX sk( fol ) u( ) neX ih (a) <0 Fol ) 


that is, the zero duality gap property with respect to s, holds. It follows 
from Proposition 2 that a Lagrange multiplier with respect to s; does not 
exist. Hence there is no a Lagrange multiplier for P(fo, f1) with respect to 
the classical Lagrange function. 


Remark 3. Let g(x) = fi(x). Then the penalty-type function for P(fo, f1) 
with respect to s, coincides with the Lagrange-type function for P(fo, g) with 
respect to sz. Hence an exact penalty parameter with respect to this penalty 
function does not exist if (3.16) holds. 


Proposition 3. Let k <1 and let (fo, fi) € Cx. Assume that the functions 
fo and f, have finite directional derivatives at a point & € intX, which is a 
minimizer for the problem P(fo, fi). Assume that 


there exists u € IR” such that (f1)/(Z,u) <0, (3.17) 


(that is, is not a min-stationary point for the function fo over X). Then 
the point & is not a min-stationary point of the function Ls, for each A > 0. 


Proof. Assume that a min-stationary point exists. Then combining Proposi- 
tion 1, (3.15) and (3.17) we get a contradiction. 


It follows from this proposition that a Lagrange multiplier with respect 
to Ls,, k < 1, does not exist if condition (3.17) holds. We now give the 
simplest example, when (3.17) is valid. Let f; be a differentiable function 
and V f(Z) # 0. Then (3.17) holds. 

Consider now a more complicated and interesting example. Let f(a) = 
max;er gi(x), where g; are differentiable functions. Then f; is a directionally 
differentiable function and f’(%,u) = max;e7(z)[Vgi(x), u], where I(Z) = {4 € 
I: g(x) = fi(x)} and [x,y] stands for the inner product of vectors x and y. 
Thus (3.17) holds in this case if and only if there exists a vector u such that 
[Vgi(Z), u] < 0 for all i € I(%). To understand the essence of this result, let us 
consider the following mathematical programming problem with m inequality 
constraints: 


min f(x) subject to g;(x) <0, «€ I= {1,...,mb}. (3.18) 


We can present (3.18) as the problem P(fo, fi) with fi(#) = max;er gi(x). 
Recall the well-known Mangasarian—Fromovitz (MF) constraint qualification 


3 Some nonlinear Lagrange and penalty functions 51 


for (3.18) (see, for example, [3]): (MF) holds at a point Z if there exists a 
vector u € IR” such that [Vg;(Z), u] < 0 for all 2 € I such that g;(%) = 0. Thus 
(3.17) for P(f, fi) is equivalent to (MF) constraint qualification for (3.18). In 
other words, if (MF) constraint qualification holds then a Lagrange multiplier 
for £,, with k < 1 does not exist. (It is known that (MF) implies the existence 
of a Lagrange multiplier with k = 1.) 

Let (f, f1) € Cx, where fo, f: are functions with finite directional deriva- 
tives. Let g = fj and x be a point such that f;(x) = 0. Then g/(z,z) = 
max(f’(x,z),0) > 0 for all z, hence (3.17) does not hold for the problem 
P(fo,g). This means that Proposition 3 could not be applied to a penalty 
function for P(f, f1) with respect to s x. 

Simple examples show that exact penalty parameters with respect to s, 
with k < 1 can exist. We now present an example from [5]. We do not provide 
any details. (These can be found in [5], Example 4.6.) 


Example 2. Let 0 < b < c <a be real numbers and X = [0,c]. Let f(a) = 
(a— 2)’, fi(z) = 2 —b, so P(f, f,) coincides with the following problem: 


minimize (a — a)” subject tor <b, rE X. 


Let & = 1. Then an exact penalty parameter exists and the least exact penalty 
d, is equal to 2(a — b). Let k = 1/2. Then an exact penalty parameter also 
exists and the least exact penalty parameter d; /2 coincides with c — b. We 
indicate the following two points: 


1) dy does not depend on the set X; dy 2 depends on this set. 
2) d; depends on the parameter a, that is on the turning point of the parabola; 
d,/2 does not depend on this parameter. 


3.5 Example 
Consider the following one-dimensional optimization problem: 
g- 02" OTe 


subject to fi(z) =«—-2<0, € X = [0,4]. 


A graphical representation of this problem is given in Figure 3.1 where 
the shaded area represents the product of the feasible region and the axis 
{(0,y):y € R}. 

It can easily be shown that for this problem M(fo, fi) = 2 at % = 2. 


3.5.1 The Lagrange function approach 


The corresponding extended Lagrangian for (3.20) is 


Lo, (2, A) = sx(fo(x), Afi(x)) = ((2° += +5)* + AK(a —2)*)E, 
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objective function (f_0) 


es 
4 6 8 10 © 12 
Fig. 3.1 P(fo, fi). 
: 21+1 
recalling that k = Sia 
Now consider de al aL 
Oi OE ——~ f(z), 3.20 
Te = 7p fol@) + AGA) (3.20) 
An easy calculation shows that 
= k> 1, 
dL _ : 
—(@)=4 -£4+X,6=1, (3.21) 
oO, kel. 


From this it is clear that an exact Lagrange multiplier \ = 3 may exist 


only for the case k = 1. 


Remark 4. In fact Figure 3.2 shows that in this example \ = 3 provides a 
local minimum at Z for k = 1 but not a global minimum, therefore it follows 
that no exact Lagrange multiplier exists for this problem. 


3.5.2 Penalty function approach 
The corresponding penalty function for (3.20) is 
£3, (ed) = (fo + OFT) 


_ [22 4 B45 4 ee 2)", fore > 2, 
= 2 
og — 4 Te 4 5, for x < 2. 
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ay 
10 


Fig. 3.2 L(x; 2). 


By Theorem 13.6 it can easily be shown that an exact penalty parameter 
exists when k < 1. This is shown in Figure 3.3 where an exact penalty 


parameter, d = 1, is used. 


Fig. 3.3 LI, (#;1). 
3 


From these results we have shown that whereas the adoption of extended 
penalty functions of the form s, yields an improvement to the traditional 
penalty function approach, this cannot be generalized to improve the La- 
grange approach. 
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Chapter 4 


Convergence of truncates in 1’ optimal 
feedback control 


Robert Wenczel, Andrew Eberhard and Robin Hill 


Abstract Existing design methodologies based on infinite-dimensional linear 
programming generally require an iterative process often involving progres- 
sive increase of truncation length, in order to achieve a desired accuracy. In 
this chapter we consider the fundamental problem of determining a priori es- 
timates of the truncation length sufficient for attainment of a given accuracy 
in the optimal objective value of certain infinite-dimensional linear programs 
arising in optimal feedback control. The treatment here also allows us to con- 
sider objective functions lacking interiority of domain, a problem which often 
arises in practice. 


Key words: /!-feedback control, epi-distance convergence, truncated convex 
programs 


4.1 Introduction 


In the literature on feedback control there exist a number of papers addressing 
the problem of designing a controller to optimize the response of a system 
to a fixed input. In the discrete-time context there are many compelling 
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reasons for using error-sequences in the space /' as the basic variable, and 
using various measures of performance (see, for example, [15]) which includes 
the I'-norm (see, for example, [{11]). The formulation of /'-problems leads 
to the computation of infyy f (and determination of optimal elements, if 
any), where M is an affine subspace of a suitable product X of I+ with 
itself. This subspace is generated by the YJBK (or Youla) parameterization 
[24] for the set of all stabilizing feedback-controllers for a given linear, time- 
invariant system. The objective f will typically be the /+-norm, (see [6], 
[11]). As is now standard in the literature on convex optimization ( [20], 
[21]), we will use convex discontinuous extended-real-valued functions, of the 
form f = ||-||1+6c, where dc is the indicator function (identically zero on C, 
identically +00 elsewhere) of some closed convex set in /'. This formalism is 
very flexible, encompassing many problem formats, including the case of time- 
domain-template constraints ([10, Chapter 14], [14]). These represent bounds 
on signal size, of the form B; < e; < A; (for all 7), where e = {e;}?2, denotes 
the error signal. Often there are also similar bounds on the control signal u. As 
the Youla parameterization generates controllers having rational z-transform, 
the variables in M should also be taken to be rational in this sense, if the 
above infimum is to equal the performance limit [6] for physically realizable 
controllers. (If this condition is relaxed, then inf yy f provides merely a lower 
bound for the physical performance limit.) The set M may be recharacterized 
by a set of linear constraints and thus forms an affine subspace of X. The 
approach in many of the above references is to evaluate infay f by use of 
classical Lagrangian-type duality theory for such minimization problems. The 
common assumption is that the underlying space X is /', or a product thereof, 
and that M is closed, forcing it to contain elements with non-rational z- 
transform, counter to the “physical” model. 

Consequently, infyz f may not coincide with the performance limit for 
physically realizable (that is, rational) controllers. However, in the context 
of most of the works cited above, such an equality is actually easily estab- 
lished (as was first noted in [27]). Indeed, whenever it is assumed that C' has 
nonempty interior, on combining this with the density in M of the subset con- 
sisting of its rational members [19], we may deduce (see Lemma 15 below) 
that the rational members of CM M are I'-dense in CM M. This yields the 
claimed equality for any continuous objective function (such as the /+-norm). 

Use of the more modern results of conjugate duality permits the extension 
of the above approach to a more general class of (—0oo, +00]-valued objective 
functions f. They are applicable even when int C may vanish. In this case, 
the question of whether infj, f equals the physical limit becomes nontrivial 
(in contrast to the case when C has interior). Indeed, if inf, f is strictly less 
than the physical limit, any result obtained by this use of duality is arguably 
of questionable engineering significance. 

It is therefore important to know when inf, f is precisely the performance 
limit for physically realizable controllers, to ensure that results obtained via 
the duality approaches described above are physically meaningful. Note that 
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this question is posed in the primal space, and may be analyzed purely in the 
primal space. This question will be the concern of this chapter. 

In this paper, we derive conditions on the system, and on the time-domain 
template set C, that ensure infy(fo + 6c) = inform fo is indeed the per- 
formance limit for physically realizable controllers, for various convex lower- 
semicontinuous performance measures fo. Here we only treat a class of 1- 
input /2-output problems (referred to as “two-block” in the control litera- 
ture), although the success of these methods for this case strongly suggests 
the possibility of a satisfactory extension to multivariable systems. 

Existing results on this question (see, for example, [14, Theorem 5.4]) 
rely on Lagrangian duality theory, and thereby demand that the time-domain 
template C has interior. Here, for the class of two-block problems treated, we 
remove this interiority requirement. Our result will be obtained by demon- 
strating the convergence of a sequence of truncated (primal) problems. More- 
over, this procedure will allow the explicit calculation of convergence esti- 
mates, unlike all prior works with the exception of [23]. (This latter paper 
estimates bounds on the truncation length for a problem with H.-norm 
constraints and uses state-space techniques, whereas our techniques are quite 
distinct.) The approach followed in our chapter has two chief outcomes. First, 
it validates the duality-based approach in an extended context, by ensuring 
that the primal problem posed in the duality recipe truly represents the 
limit-of-performance for realizable controllers. Secondly, it provides a com- 
putational alternative to duality itself, by exhibiting a convergent sequence of 
finite-dimensional primal approximations with explicit error estimates along 
the sequence. (This contrasts with traditional “primal-dual” approximation 
schemes, which generally do not yield explicit convergence rates.) 

This will be achieved by the use of some recently developed tools in opti- 
mization theory (the relevant results of which are catalogued in Section 4.5)— 
namely, the notion of epi-distance (or Attouch—Wets) convergence for convex 
sets and functions [2, 3, 4, 5]. Epi-distance convergence has the feature that 
if f, converges to f in this sense, then subject to some mild conditions, 


where d(f,g) is a metric describing this mode of convergence. Since the [!- 
control problem is expressible as infx(f + 6,7), this leads naturally to the 
question of the Attouch—Wets convergence of sums f,, + daz, of sequences of 
functions (where f,, and M,, are approximations to f and M respectively). A 
result from [5], which estimates the “epi-distances” d( f+ gn, f +g) between 
sums of functions, in terms of sums of d(f,, f) and d(gn,g), is restated and 
modified to suit our purpose. This result requires that the objective and 
constraints satisfy a so-called “constraint qualification” (CQ). 

In Section 4.6 some conditions on the template C’ and on M are derived 
that ensure that the CQ holds. Also, the truncated problems will be defined, 
and some fundamental limitations on the truncation scheme (in relation to 
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satisfiability of the CQ) will be discussed. Specifically, the basic optimization 
will be formulated over two variables e and u in I’. The truncated approxi- 
mate problems will be formed by truncating in the e variable only, since the 
CQ will be seen to fail under any attempt to truncate in both variables. Thus 
in these approximate problems, the set of admissible (e, u) pairs will contain 
a u of infinite length. Despite this, it will be noted that the set of such (e, u)’s 
will be generated by a finite basis (implied by the e’s), so these approximates 
are truly finite-dimensional. 

Finally, in Section 4.7, the results of Section 4.5 will be applied to deduce 
convergence of a sequence of finite-dimensional approximating minimizations. 
This will follow since the appropriate distances d(C, C’) can be almost triv- 
ially calculated. Also, we observe that for a sufficiently restricted class of 
systems, our truncation scheme is equivalent to a simultaneous truncation in 
both variables e and u. In fact, this equivalence is satisfied precisely when 
the system has no minimum-phase zeros. 

The motivation for this work arose from a deficiency in current computa- 
tional practices which use simultaneous primal and dual truncations. These 
yield upper and lower bounds for the original primal problem, but have the 
disadvantage of not providing upper estimates on the order of truncation 
sufficient for attainment of a prescribed accuracy. 


4.2 Mathematical preliminaries 


We let R stand for the extended reals [—oo, +00]. For a Banach space X, 
balls in X centered at 0 will be written as B(0,p) = {a € X | ||z|| < p} and 
B(0,p) ={x € X | ||x|| < p}. Corresponding balls in the dual space X* will 
be denoted B*(0,) and B*(0, p) respectively. The indicator function of a set 
AC X will be denoted 64. We will use u.s.c. to denote upper-semicontinuity 
and Ls.c. to denote lower-semicontinuity. Recall that a function f : X — R 
is called proper if never equal to —oo and not identically +oo, and proper 
closed if it is also l.s.c. For a function f : X — R, the epigraph of f, denoted 
epi f, is the set {(z,a) € X x R | f(a) < a}. The domain, denoted dom f, 
is the set {x € X | f(a) < +00}. The (sub-)level set {2 € X | f(x) < a} 
(where a > infx f) will be given the abbreviation {f < a}. For € > 0, and 
if infy f is finite, eargmin f = {x € X | f(x) < infy f + €} is the set of & 
approximate minimizers of f. Any product X xY of normed spaces will always 
be understood to be endowed with the box norm ||(x, y)|| = max{||<], ||y||}; 
any balls in such product spaces will always be with respect to the box norm. 

Here /!(C) denotes the Banach space of all complex sequences a = {an} 
such that |al|; := 307-9 |an| is finite; 

[' denotes the Banach space of all real sequences in [+(C); and 1° denotes 
the Banach space of all real sequences a = {a,}°2, such that |lal|o := 
sup,, |@n| is finite. 
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For two sequences a, b their convolution a * b is the sequence (a * b); = 
dj =0 47bi-j- 

The length of a sequence a, denoted I(a), is the smallest integer n such 
that a; = 0 for alli > n. 

We define D to be the closed unit disk {z € C | |z| < 1} in the complex 
plane; D is the open unit disk {z € C | |z| < 1}. 

The z-transform of a = {a,}°o is the function @(z) = S77. 9 anz” for 
complex z wherever it is defined. The inverse z-transform of @ will be written 
as. 2 *(Q): 

Also 11 denotes the set of all z-transforms of sequences in /'. It can be 
regarded as a subset of the collection of all continuous functions on D that 
are analytic on D. 

We use Roo to denote the set of all rational functions of a complex variable, 
with no poles in D. 


Definition 1. Let A be a convex set in a topological vector space and x € A. 
Then 


1 coneA =Uys0AA (the smallest convex cone containing A); 
2 The core or algebraic interior of A is characterized as x € core A iff Vy € X, 
de > 0 such that VA € [—e, €] we have x + Ay € A. 


The following generalized interiority concepts were introduced in [8] and 
(17, 18] respectively in the context of Fenchel duality theory, to frame a 
sufficient condition for strong duality that is weaker than the classical Slater 
condition. 


Definition 2. Let A be a convex set in a topological vector space and x € A. 
Then 


1 The quasi relative interior of A (qriA) consists of all x in A for which 
cone (A — x) is a closed subspace of X; 

2 The strong quasi relative interior of A (sqri A) consists of all x in X for 
which cone (A — 2) is a closed subspace of X. 


Note that 0 € coreA if and only if coneA = X, and that in general, 
core A C sqriA C qri A. 

Nearly all modern results in conjugate duality theory use constraint qual- 
ifications based on one or other of these generalized interiors. Some appli- 
cations of such duality results to (discrete-time) feedback optimal control 
may be found in [11] and [15]. An example of an application to a problem in 
deterministic optimal control (that is, without feedback) is outlined in [17]. 

From [16] and [20] we have the following. Recall that a set A in a topolog- 
ical linear space X is ideally convex if for any bounded sequence {x} C A 
and {X,,} of nonnegative numbers with )>>°_, An = 1, the series 0°, An@n 
either converges to an element of A, or else does not converge at all. Open 
or closed convex sets are ideally convex, as is any finite-dimensional convex 
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set. In particular, if X is Banach, then such series always converge, and the 
definition of ideal convexity only requires that \>°°_; An tn be in A. From [16, 
Section 17E] we have the following proposition. 


Proposition 1. For a Banach space X, 


1IfC CX is closed convex, it is ideally convex. 

2 For ideally convex C, coreC = core C = int C = int C. 

3 If A and B are ideally convex subsets of X, one of which is bounded, then 
A— B is ideally convex. 


Proof. We prove the last assertion only; the rest can be found in the cited 
reference. Let {a, — b,} GC A—  B be a bounded sequence and let », > 
0 be such that °°, A, = 1. Then, due to the assumed boundedness of 
one of A or B, {an} C A and {b,} C B are both bounded, yielding the 
convergent sums )77~_, An@n € A and So, Anbn € B. Thus 07, An(an — 
Bn) = yr Andn — Or Anbdbn € A- B. 


Corollary 1. Let A and C be closed convex subsets of the Banach space X . 
Then 


0 € core(A — C) implies 0 € int (AN B(O, p) — CN BO, p)) 
for some p > 0. 


Proof. Let p > inf anc ||- || and let 7 € ANCN B(O0, p). Then for « € X, x= 
A(a—c) for some A > 0, a € A, c € C since by assumption, cone (A—C) = X. 
Then for any ¢ > 1 sufficiently large so that t~1(|lal| + |lel]) + Z|] < p, 


=o ((lee(-2)o]- lee (-2)9) 


€ tA(AN BO, p) — CN B(0, p)) C cone(AN B(0, p) — CN B(O, p)). 


Hence 0 € core(AN B(0, p) - CN B(0, p)) from the arbitrariness of ze X. 
The result follows since by Proposition 1 the core and interior of AN B(0, p) — 
CM B(0, p) coincide. 


4.3 System-theoretic preliminaries 


4.3.1 Basic system concepts 


In its most abstract form, a system may be viewed as a map H : X; — Xo 
between a space X; of inputs and a space Xo of outputs. In the theory of 
feedback stabilization of systems, interconnections appear where the output 
of one system forms the input of another, and for this to make sense, X; 
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and Xo will, for simplicity, be taken to be the same space X. The system H 
is said to be linear, if X is a linear space and H a linear operator thereon. 
Our focus will be on SISO (single-input /single-output) linear discrete-time 


systems. In this case, X will normally be contained in the space RZ of real- 
valued sequences. 
For n € N define a time-shift 7, on sequences in X by 


(tmo)é = Pi-n; Pex. 


If each 7, commutes with H, then H is called shift (or time-) invariant. Our 
interest will be in linear time-invariant (or LTT) systems. 

It is well known [12] that H is LTI if and only if it takes the form of 
the convolution operator h*x for some h € X. This h is called the impulse- 
response of H, since h = H(d) = hx6, where 6 is the (Dirac) delta-function in 
continuous time, or is the unit pulse sequence (1,0,0,...), in discrete time. 

The discrete-time LTI system H = hx is causal if the support of h lies 
in the positive time-axis N = {0,1,2,...}. The significance of this notion is 
clarified after observing the action of H on an input u, which takes the form 
(Hu)n = open hkUn—k, So if h takes any nonzero values for negative time 
then evaluation of the output at time n would require foreknowledge of input 
behavior at later times. 

Note that for LTI systems, a natural (but by no means the only) choice for 
the “space of signals” X is the space of sequences for which the z-transform 
exists. From now on we identify a LTI system H with its impulse-response 


h:= H(0). 


Definition 3. The LTI system h is BIBO (bounded-input/bounded-output)- 
stable if h x u € I whenever u € 1°. 


From the proof of [12, Theorem 7.1.5], it is known that H = hx is BIBO- 
stable if and only if h € l'. 

Any LTI system H can be characterized by its transfer function, defined 
to be the z-transform h of its impulse-response h, The input-output relation 
then takes the form ut ++ hu for appropriate inputs u. Thus convolution equa- 
tions can be solved by attending to an algebraic relation, much simplifying 
the analysis of such systems. 

For systems H,, H» we shall often omit the convolution symbol * from 
products H;H2, which will be understood to be the system (hy * hg)*. By 
the commutativity of *, H,;H2 = H2H,. This notation will be useful in 
that now formal manipulation of (LTT) systems is identical to that of their 
transforms (that is, transfer functions). Consequently, we may let the sym- 
bol H stand either for the system, or its impulse-response h, or its transfer 
function h. 

We now express stability in terms of rational transfer functions. The alge- 
bra R to be defined below shall have a fundamental role in the theory of 
feedback-stabilization of systems. First, we recall some notation: 


62 R. Wenczel et al. 


R{[z] — the space of polynomials with real coefficients, of the complex 
variable z; 

R(z) — the space of rational functions with real coefficients, of the complex 
variable z. Here 


Roo := {h € R(z) | h has no poles in the closed unit disk}. 
It is readily established that 
Roo =UNR(z), 


so Ro forms the set of all rational stable transfer functions. 


4.3.2 Feedback stabilization of linear systems 


Here we summarize the theory of stabilization of (rational) LTI systems by 
use of feedback connection with other (rational) LTI systems. All definitions 
and results given in this subsection may be found in [24]. 

Consider the feedback configuration of Figure 4.1. Here w is the “reference 
input,” e = w — y is the “(tracking—) error,” representing the gap between 
the closed-loop output y and the reference input w, and u denotes the control 
signal (or “input activity,” or “actuator output” ). 


W + € 
=) K 


Fig. 4.1 A closed-loop control system. 


Definition 4. A (rational) LTI discrete-time system K is said to (BI-BO-) 
stabilize the (rational) LTI system P if the closed loop in Figure 4.1 is stable 
in the sense that for any bounded input w € [°° and any bounded additively- 
applied disturbance (such as A in Figure 4.1) at any point in the loop, all 
resulting signals in the loop are bounded. Such K is referred to as a stabilizing 
compensator or controller. 


It should be noted that the definition as stated applies to general non-LTI, 
even nonlinear systems, and amounts to the requirement of “internal,” as well 
as “external,” stability. This is of importance, since real physical systems have 
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a finite operating range, and it would not be desirable for, say, the control 
signal generated by K to become too large and “blow up” the plant P. 

Writing P = px and K = kx, with p and kin R(z) and noting that the 
transfer function between any two points of the loop can be constructed by 
addition or multiplication from the three transfer functions given in (4.1) 
below, we obtain the following well-known result. 


Proposition 2. K stabilizes P if and only if 
1/(1+ pk), k/(1+ pk) and p/(1 + pk) € Reo. (4.1) 


We denote by S(P) the set of rational stabilizing compensators for the 
plant P. The fundamental result of the theory of feedback stabilization (the 
YJBK factorization — see Proposition 3) states that S(P) has the struc- 
ture of an affine subset of R(z). Before we move to a precise statement of 
this result, we remind the reader that we only intend to deal with single- 
input /single-output systems. In the multi-input/multi-output case, each sys- 
tem would be represented by a matrix of transfer functions, which greatly 
complicates the analysis. We note, however, that this factorization result does 
extend to this case (see [24, Chapter 5]). 


Definition 5. Let fi and d be in Roo. We say that ? and d are coprime if 
there exist @ and 7 in R. such that 


n+ gd=1in R(z), 
(that is, for all z € C, #(z)A(z) + G(z)d(z) = 1 except at singularities). This 
can be easily shown to be identical to coprimeness in R. considered as an 
abstract ring. 


Let p € R(z). It has a coprime factorization p = A#/d where fi and d are in 
Roo. Indeed, we can write p = G/f for polynomials g and 7 having no common 
factors, which implies coprimeness in R[z] and hence in R D RI]. 

We can now state the fundamental theorem [24, Chapter 2], also referred 
to as the Youla (or YJBK) parameterization. 


Proposition 3. Let the plant P have rational transfer function p € R(z), let 
n and din Rx form a coprime factorization and let & and J in Ro arise 
from the coprimeness of rh and d. Then 


y= | 40 
y — nq 


GE Rx, axaink (4.2) 


This result has the following consequence for the stabilized closed-loop 
mappings. Recall that 1/(1-+ pé) is the transfer function taking input w to e 
(that is, é = w/(1+ pé)) and é/(1+ 2) maps the input w to u. We now have 
(see [24]) the following result. 
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Corollary 2. The set of all closed-loop maps © taking w to (e,u), achieved 
by some stabilizing compensator C € S(P), is 


{@=4(8) -ai(9) | aeRw. az alah (4.3) 


and has the form of an affine set in R(z) x R(z). 


4.4 Formulation of the optimization problem in I* 


In the following, we consider SISO (single-input /single-output) rational, lin- 
ear time-invariant (LTI) systems. For such a system (the ‘plant’) we char- 
acterize the set of error-signal and control-signal pairs achievable for some 
stabilizing compensator, in the one-degree-of-freedom feedback configuration 
(Figure 4.1). The derivation, based on the Youla parameterization, is stan- 
dard. 

Assume that the reference input w ¥ 0 has rational z-transform and the 
plant P is rational and causal, so it has no pole at 0. For a = e’® on the unit 
circle, define a subspace Xq of 1° by 


X, =l' +span {c, s}, 


where 
c:= {coskO}729 and s := {sinkO}7-o. 


We shall assume that the error signals (usually denoted by e here) are in /', 
so we are considering only those controllers that make the closed-loop system 
track the input (in an /'-sense), and we shall assume also that the associated 
control signal remains bounded (and in fact resides in X,). 


Definition 6. For any u € Xq, let ue and u, denote the (unique) real num- 
bers such that u— uec — uss is in L!. 


Let the plant P have transfer function P(z) with the coprime factorization 
P(z) = A(z)/d(z) where ft and d are members of R{z] (the space of polynomi- 
als, with real coefficients, in the complex variable z). Write A(z) = >3_) niz", 
d(z) = ar djz', n:= (no, m1, ..,%9,0,..) and d:= (dp,.., dt, 0,..). 

Let x and y be finite-length sequences such that ## + gd = 1 in R[z]. 
Their existence is a consequence of the coprimeness of # and d in R{z] 
(see [24]). 

If we aim to perform an optimization over a subset of the set S(P) of 
stabilizing controllers K for P, such that for each K in this subset, the 
corresponding error signal ¢(K) and control output u(K) are in I+ and Xq 
respectively, the appropriate feasible set is 
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Fo := {(e,u) € l' x X, |e =e(K) for some K € S(P), u=K *e} 
= {(e,u) €l' x Xq | (e,u) = wd (y,x) —q* wd (n,—d) 


for some q such that G © Rao \{y/n}}, 


I 


where the latter equality follows from the YJBK factorization for S(P) (use 
Property 3 or Corollary 2). 

Henceforth we shall always assume that w € Xq is rational (that is, has 
rational z-transform), the plant P is rational and has no zero at a. 

By the lemma to follow (whose proof is deferred to the Appendix) we ob- 
serve that a single sinusoid suffices to characterize the asymptotic (or steady- 
state) behavior of u for all feasible signal pairs (e, u) € Fo. 


Lemma 1. Let w € Xq and P be rational and assume that P(a) #0. Then 
for any (e,u) € Fo, 

Uc = Bo = We Re Pa) ws Im Play’ 

= B,:= w, Re Pa Piay + We Im Play: 
Using this lemma, we can translate Fo in the u variable to obtain the set 
F := Fo — (0,8.c+ G8), having the form (where we use the notation x * 
(y, z) := (a * y,x * z) for sequences x, y, z) 
F =(€,u) ex? | 
(e, u) =wed* (y, 2) i (0, Bec + B58) —q*w * dx (n, —d) 
for some q such that G € Rao\{y/n}}. 


We need to recast F into a form to which the tools of optimization theory 
can be applied. As a first step, formally define the sets 


M:= 
€(pi) = O (pi; pole of P: |p;| <1) «= 1,..,m1 
1,2 | €(Z;) = W(Z;)(%; zero of P : |zZ;| <1) j =1,..,me 
kere) (Gp) = O (Gp zero of W: |p| <1) k=1,..,msf’ 


é 
dxeltneu=wxd—n*(B.ct+ Bs) 


M, := {(e,u) € M | é,a rational, e 4 0} and 


MO v= 
é(p:) = 0 (p; a pole of P with |p;| <1) i=1,..,m1 
ee Ut €(2Z;) = (25) (2; a zero of P with |Z; | < 1) = 1,.., Me 
E(Gp) = O (Gp a zero of W with |d,| <1) k =1,..,ms 


with the understanding that in the above sets, whenever P and wW have 
a common pole at a (and hence at @), the constraint é(a) = 0 is absent. 
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Moreover, note that in the definition of M, the constraints on é at the ze- 
ros Z (if Z # a, G@) are redundant, as follows from the closed-loop equation 
dxe+nxu=wed—n* (Bect Bs). 

The above constraint system can also be obtained from the general mul- 
tivariable formalism of [9], [14] etc., on removal of the redundancies in the 
latter. The content of the following remark will be used repeatedly in our 
work. 


Remark 1. We note the following relation between M©) and M. Let (é, i) be 
any element of M. Then M(°) —é and M —(@, vi) consist of elements satisfying 
the corresponding constraints with right-hand sides set to zero. Assuming 
that P (equivalently, ) has no zeros on the unit circle, and that all its zeros 
in D are simple, then the map T on M©) — @ taking e to —Z~!(dé/f) maps 
into 1! (by Lemma 4 below) with ||T|| < «]|d||1. Then (e,Te) € M — (é, wi), 
since d* e+ n* Te = 0, which follows on taking z-transforms. 


The next two lemmas give simple sufficient conditions for the feasible 
set F to be fully recharacterized as an affine subspace defined by a set of 
linear constraints, and for when elements of finite length exist. Since they are 
only minor modifications of standard arguments, we omit the proofs here, 
relegating them to the Appendix. 


Lemma 2. Assume that either 


lin D the poles/zeros of P and W are distinct and simple; or 
2 in D\{a,a} the poles/zeros of P and w are distinct and simple, and P and 
w have a common simple pole at z =a. 


Then M, = F. 


Lemma 3. Let P and w € X,q be rational with P(a) #4 0. Assume also the 
following: 
1 either 


(a) in D the poles/zeros of P and W are distinct and simple; or 
(b) in D\{a,a@} the poles/zeros of P and w are distinct and simple, and P 
and w have a common simple pole at z = a; 


2 (w—wee— wss) *d has finite length; 
3 all zeros (in C) of P are distinct and simple. 


Then M contains elements (€,%) of finite length. 


Remark 2. In the above result, one can always choose that € # 0, so that 
(é,u) € M,., which coincides with F by Lemma 2. Thus the latter is nonempty. 


Lemma 4. Let f € i and let p(-) be a polynomial with real coefficients with 
no zeros on the unit circle. Assume also at each zero of p(-) in the open unit 
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disk, that f has a zero also, of no lesser multiplicity. Then f(-)/p(-) € it, 
Also, if ¢ = Z-1(f/p), then lalla < || fll. where 


af 
1/n = Il a-la) (1-2), 
a,a’Zeros Of p: |a|<1, |a’|>1 


and where in the above product, the zeros appear according to multiplicity, 
and moreover, both members of any conjugate pair of zeros are included. 


Proof. See the Appendix. 
Remark 3. Such a result will not hold in general if p(-) has zeros on the unit 


circle; a counterexample for |a| = 1 is as follows. 
j 


Let fo =—land f; = Ga) for j > 1. Then f €1'(C) and f(a) = 0 since 
@)) 


ye RE = 1. If qis the inverse z-transform of [z +> =“*], then for each 


’ 
lax| = aret Lys lie = EEL 


so q ¢ 11(C). 


The hard bounds on the error and on input activity will be represented 
by the closed convex sets C) and C™) respectively, given by 


Co := {eel | BO <e; < Ab for all i}, (4.4) 
where Be <O0< Ae) eventually for large 7, and 
CH = {uel | B® <u; < AW for all i}, where (4.5) 
C:=Cc) xo™, (4.6) 
We shall also make use of the truncated sets 


co = {e€ C | e; =0 for i> k}. (4.7) 


We shall not use the truncated sets Ce, for reasons that will be indicated 
in Section 4.6.1. 


4.5 Convergence tools 


It is well known that there is no general correspondence between pointwise 
convergence of functions, and the convergence, if any, of their infima, or of op- 
timal or near optimal elements. Hence approximation schemes for minimiza- 
tion of functions, based on this concept, are generally useless. In contrast, 
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schemes based on monotone pointwise convergence, or on uniform conver- 
gence, have more desirable consequences in this regard. However, the cases 
above do not include all approximation schemes of interest. 

In response to this difficulty, a new class of convergence notions has gained 
prominence in optimization theory, collectively referred to as “variational con- 
vergence” or “epi-convergence.” These are generally characterized as suitable 
set-convergences of epigraphs. For an extensive account of the theory, the 
reader is referred to [1, 7]. Such convergences have the desirable property 
that epi-convergence of functions implies, under minimal assumption, the 
convergence of the infima to that of the limiting function, and further, for 
some cases, the convergence of the corresponding optimal, or near-optimal, 
elements. 

The convergence notion with the most powerful consequences for the con- 
vergence of infima is that of Attouch and Wets, introduced by them in [2, 3, 4]. 
Propositions 4 and 5 (see below) will form the basis for the calculation of ex- 
plicit error estimates for our truncation scheme. 


Definition 7. For sets C and D in a normed space X, and p > 0, 
ie —d 
d(x, D) := inf lle — dll, 
ep(C,D):= sup d(a,D), 
«ECNB(0,p) 


haus ,(C, D) := max{e,(C, D),e,(D,C)}, (the p-Hausdorff distance) . 


For functions f and g mapping X to R, the “p-epi-distance” between f and 
g is defined by 


d,(f,g) == haus p(epi f, epi g). 
It is easily shown that 
haus ae, D) = d,(dc, dp). 


Definition 8. Let C,, and C be subsets of X. Then C,, Attouch—Wets con- 
verges to C’ iff for each p > 0, we have 


lim haus p(C,,C) = 0. 


n—0o 


If f, and f are R-valued functions on X, then f, Attouch—Wets (or epi- 
distance) converges to f if for each p > 0, 


lim dp(fns f) = 9, 


n—- co 
(that is, if their epigraphs Attouch—Wets converge). 


This convergence concept is well suited to the treatment of approximation 
schemes in optimization, since the epi-distances d, provide a measure of the 
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difference between the infima of two functions, as indicated by the following 
result of Attouch and Wets. For y: X — R with infx y finite, we denote by 
e-argmin y the set {a € X | p(x) < infy p+} of eapproximate minimizers 
of yon X. 


Proposition 4. /4, Theorem 4.3] Let X be normed and let y and w be proper 
R-valued functions on X, such that 


1infx y and infx w are both finite; 
2 there exists pp > 0 such that for alle > 0, (e-argmin y) N B(0, po) #0 and 
(c-argmin }) NBO, po) # 0. 
Then 
inf p — inf Y} < da(oo)(%s Y); 


where a(po) := max{po,1+ |infx y|,1 + |infx wI}. 


Since our optimization problems are expressible as the infimum of a sum 
of two (convex) functions, we will find useful a result on the Attouch—Wets 
convergence of the sum of two Attouch—Wets convergent families of (Ls.c. 
convex) functions. The following result by Azé and Penot may be found in 
full generality in [5, Corollary 2.9]. For simplicity, we only quote the form 
this result takes when the limit functions are both nonnegative on X. 


Proposition 5. Let X be a Banach space, let fr, f, gn and g be proper closed 
conver R-valued functions on X, with f > 0 and g > 0. Assume that we have 
the Attouch-Wets convergence fn — f and gn — g, and that for some s > 0, 
t>0 andr>0, 


B(0, s)? C A(X)N B(O,r)? — {f <r} x {g <r} N BO, ty’, (4.8) 
where A(X) := {(a,x) | a € X} and B(0,s)? denotes a ball in the box norm 


in X x X (that is, B(0,s)? = B(0,s) x B(0,s)). Then for each p > 2r+t 
and alln € N such that 


Gps i dp+s(gn. 9) <5, 


we have 


dain + Ons f 4 9) < meete [apie inrd) + do+s(gn,9)] : 


In particular, fn + gn Attouch—Wets converges to f +g. 


Corollary 3. Let C,,, C and M be closed convex subsets of a Banach space 
X and let fy, and f be proper closed conver R-valued functions on X with 
f > 0 and Attouch-Wets convergence f, — f and C,, > C. Suppose that for 
somes >0,t>0 andr >0, 


B(0,s)? C A(X) N B(O,r)? — (C x M)n B(O,t)?; (4.9) 
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B(O, s)? C A(X) N B(O,r)? — [{f <r} x (CA M)]N B(O,t)?. (4.10) 
Assume further that fn and f satisfy 


dno € N, a€R such that max{ sup cine, tas dat, f}<aand (4.11) 


n>no Vn 


there exists po > 0 such that 


B(0,p0) 2 LJ {fn S$ @+1}NC,N Mand B(O, po) 2{f <at1}NCnM. 
n>no 
(4.12) 
Then for any fixed p > max{2r 4+ t,p0,a+1}, and all n > no for which 


2r+2s+p 
8 


do+s(fn, f) + dp+25(Cn, C) < s,and dp+as(Ch, C) <5, (4.13) 


we have 


Pe fn = dat a <d,(frnt+6c,qm; f +6cnm) 


< ular cat? osa( Sn f) + 


art+2stp, 
8 S 


p+2s (Crys C) . 


Proof. For n > no, gy := f+ dcnm and gn := fn +dc,nm satisfy the 
hypotheses of Proposition 4 and so 


ae ta dnt i S dopo) (Ys Yn) < dp(P; Yn) - 


The estimates for d,(y, Yn) will now follow from two applications of Proposi- 
tion 5. Indeed, from (4.9) and Proposition 5, whenever p > 2r+t and n is such 
that dp42s(Cn,C) < s, then dpy5(CnMM,CNM) < ttt) G4 .(Cn,C). 
Taking n to be such that (4.13) holds, we find that dpis(fn,f) + dp+s(Cn A 


M,CNM) <s, so from (4.10) and Proposition 5 again, 
2r+s+ 


dp(Y, Pn) < a [dp+s(fn, f) + dp+s(Cn OM,CN M)| 


8 
2r+s art 2st p , 
8 


< —— apni) a 


Ss 


p+2s (Ch; C) ‘ 


If we keep fy, fixed in this process, then only one iteration of Proposition 5 
is required, which will lead to a better coefficient for the rate of convergence 
than that obtained from taking d,(f,, f) = 0 in Corollary 3. 


Corollary 4. Let C,,, C and M be closed convex subsets of a Banach space 
X and let f be a proper closed convex R-valued functions on X with f > 0 
and Attouch-Wets convergence C;, > C. Suppose that for some s >0, t > 0 
and r > 0, 
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B(0, s)? C A(X) N B(O,r)? — [ff <r} NM) x C]N B(O,t)?. (4.14) 
Assume further that for some nog € N, a€ R and po > 0, 


inf f <a and B(0,p0) D{f <a+1}NCnM. (4.15) 


Gass 


Then for any fixed p > max{2r 4+ t,p0,a+1}, and all n > no for which 


diNC Ces: (4.16) 


we have 


cin f = aut f <d,(f +46c,am,f + %cenm) 


2 
< err BoE CO. 
Ss 


Proof. Similar to that of Corollary 3, but with only one appeal to 
Proposition 5. 


4.6 Verification of the constraint qualification 


Our intention will be to apply the results of the preceding section with X = 
I' x1" (with the box norm), so the norm on X x X will be the four-fold product 
of the norm on I. From now on we will not notationally distinguish balls in 
the various product spaces, the dimensionality being clear from context. 

Before we can apply the convergence theory of Section 4.5, we need to check 
that the constraint qualifications (4.9) and (4.10) (or (4.14)) are satisfied. 
This will be the main concern in this section. First, we consider some more 
readily verifiable sufficient conditions for (4.9) and (4.10) to hold. 


Lemma 5. Suppose that for sets C and M we have 
B(0,s) CCN B(0,c) -M (4.17) 
for some s and a. Then it follows that 
B(0, s/2)? C A(X) N B(O,o +38/2)? —(C x M)N B(O,o +8)? (4.18) 
(of the form (4.9)). Next, suppose we have 
B(0,m) C{f <v}N BO, A) — (CN M)N BO, A) (4.19) 


for some XA, ps and v. 
Then it follows that 
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B(O, u/2)? C A(X) NBO, w/2+A)?—({f < v} x (CNM))NB(O, A)? (4.20) 


a form (4.10)). 
Iso, 
B(0,s) CCN B(0,c) —{f <r}NMN B(0,o + 8) (4.21) 


implies 
B(0, 8/2)? C A(X) NM B(0, 0 + 38/2)? — (C x ({f <r} NM))N B(O,o + 8)? 


(4.22) 
(which is of the form of (4.14)). 


Proof. Now (4.17) implies 
B(0,s) CCN B(0,0) - MN B(0,o0 +8). 


Place D := A(X)—(Cx M)NB(0,0+s)?. Then, if P denotes the subtraction 
map taking (x,y) to y — x, we have 


P(D) =CnNB(0,0+ 8s) -MN B(0,0 +s) 
2 Cn B(0,c) - MN B(0,0 +s) D B(0,s), 


and hence 


B(0, s/2)? C P~1(B(0,s)) C P“'P(D) 
=D+P 7(0)=D+A(X)=D 


since A(X) + A(X) = A(X). Thus 
B(0, s/2)? C A(X) — (C x M)N B(0,0 +s)’, 


from which (4.18) clearly follows (and is of the form (4.9) for suitable r, s 
and t). 

Next, suppose we have (4.19). If we define D := A(X) — ({f <v} x (CN 
M))NB(0, A)?, then similarly P(D) = {f < v}NB(0,A)—(CNM)NB(O, A) D 
B(0, 1) so, proceeding as above, we obtain (4.20) (which is of the form (4.10)). 

The last assertion follows from the first on substituting {f <r}MM for 
M therein. 


In this chapter, the objective functions f will always be chosen such that 
(4.17) will imply (4.19) or (4.21). Hence in this section we focus on verification 
of (4.17). 

The constraints of M) have the form Ae = b, where m = 2m ,+2m2+2m3 
and 


b= (Re w(Z), Im #(%),.., Re W(Zm, ), lm H(Zm,), 0,..,0,..,0)7 € R™ 


with A: 1! — R”™ given by 
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Ae := (.., Re é(Z;), Im é(%;),.., Re €(p;), Im €(p;),.., Re €(,), Im E(H,),..)” , 
(4.23) 
where Z;, p; and v% denote distinct elements of the unit disk, and i, j and k 
range over {1,...,m1i}, {1,...,m2} and {1,...,ms3} respectively. Then A is 
expressible as a matrix operator of the form 


1 Re Rez --- 
0 ImzZ, Im 2 


1 Re 2m, Re z2 


re eee 


= 52 
0 Im 2m, Im Z;,, ++ 


1 Rep, Re p? 
0 Imp, Im 7? 
(aij )i<i<mj0<j<oo=] i: 
1 Re fim, Re pe, °° 
0 Im pm, In pe, on 
1 Red, Rev? 
0 Imv, Im v? 


7 =) 
1 Re Um, Re Dette 


n =2 
0 Im tm, Im U;,, °° 


where rows of imaginary parts of the matrix (a;;) and of b are omitted when- 
ever the associated 2;, pj; or U, is real. 

For integer K, define A“) to be the truncated operator taking R* into 
R” given by the matrix 


(Giz )1<i<m,0<j<K . 


It is known, as stated in [9], that A‘™ is invertible on R™. Hence for each 
K2m, 
Qk i= sup inf Ella (4.24) 
BER™ €ER 
IIBllo =1 AME=B 


is finite. In particular, ax < a, for all K > m and 


oe= [lel 


1 


9 


the norm being taken relative to the 1-norm on the range of the inverse 
[Ao] ~" and the oo-norm on its domain. Note that ax satisfies 


(VB ER™)(AE € R*)(AME = Band |[E|]1 < ax||5|loo). 
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Lemma 6. Let C := feel | Bi < e; < Aj; i= 0,1,2,...} where the 
bounds B;, A; satisfy: 3€é © R® such that A€ = b and B; < & < A; for 
i=0,1,...,K —1; and B, <0< A; fori> K. 

Let K >m, and let ax be as in (4.24). Also, let 


e:= min {|A; — &|,|B;— & |} 


0<i<K-1 
(soe >0). Then 
B(0, aj") C CON BO, € + [[Ell1) — MO 9 BOO, (1 + a") + I1Ell)- 
Proof. Place s := €/ax and let 7 € B(0,s). Set €:= {€,0,0,..} e CONM. 
As A(™®) maps onto R”, there exists p € R* Cl such that Ap = AM) p = 
An with 


lp, Sox ||Anlloo Sax max {]9(2)|, li(Bs)| lHe)IF S exllalls < €. 


Consequently, p € C‘) —@ since for each i < K —1, |p;| < € < minjex {|A; — 
é;|, |B; — é;|}. Thus, since 


A(1n — (p + €)) = An — (Ap + Ae 


II 
| 
n 
OI 
II 
| 
= 


so that n—(p+e) € —M©), we have n = p+€+(n—(p+e)) € CO+(—M) = 
CC) — M©), as well as 


lp + ella < llplla + Wella <e+ lll, 


which implies that 7 € C) N B(O,€ + |lé|1) - M© m B(O, s + € 4 |lé||1). 


Lemma 7. Let C&, C™,C, M and M) be as in Section 4.4, and assume 
that there exists (€,u) € CN M of finite length such that 
1 B(0,r) CC® N B(0,c) — M© A B(0, p) for some positive p, 0, T; 
27 has no zeros on the unit circle; and 
3 B(O, pn) C C™ —& for some positive pw with p> K||d|\1(o + \lEll1), where « 
is given in Lemma 4. 


Then, if s := min{r, 1 — K||d||1(e + |lEll4) }, we have 
B(0,s) CCN B(O, max{o, w + |||, }) — M. (4.25) 


Proof. Let (€,n) € B(0,s). From Assumption 7, € = v—e € C/N B(0,c) — 
M© 1 B(O,p) so € = v' — e! where vo! = v—é € (C — €) N B(O,0 + |lél|1) 
and e' =e—é€€ (M) —e@)n B(O, p+ |léll1). Place u=+Te’, then 
[eel < [lalla + [7 Mlle’ 
S Inia + 7 N(Mella + 2) 
<s+|[T(lela +) <4, 
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where the last inequality follows by Remark 1, and so u € C™ — a from As- 
sumption 7 and ut t € CN B(0, w+|ldl|1). Also, v’ +é = v € C/O) N BO, c) 
and 


(€,7) = (v’, u) al (e572) € (v’, u) = (M i te, u)) 
=(v'+é,ut+%)—-M=(v,u+%)-M 
© CN B(0,max{o, + |lalla}) — M. 


Remark 4. The existence of (é,a%) in M of finite length is ensured, for in- 


stance, under the conditions of Lemma 3. If the bounds A‘ and B‘® satisfy 
Bo) <e@< Ae) for i < 1(é) (where I(-) denotes length), then Condition 7 
of Lemma 7 follows, for suitable constants, from Lemma 6. (Note again, that 
this holds for arbitrary Alo and BO) for 7 > l(é), so by making these bounds 
decay to zero sufficiently rapidly, as will be shown in Lemma 13 to come, we 
can enforce compactness of C), which will be essential for the Attouch—Wets 
convergence of C to C). 

If, furthermore, the bounds A‘) and BS” are chosen to envelop @ (Vi, 
Bw <u < AM), and to be bounded away from zero by sufficient distance, 
for all 7, Condition 7 of Lemma 7 is also satisfied. 


Remark 5. Note that 
0 € int(C— M) 


iff (4.9) holds for some r, s and t. Indeed, if 0 € int (C—M) then by Corollary 1 
we obtain (4.17) for some s and a, which implies (4.18), which is of the form 
(4.9) for suitable constants. Conversely, (4.9) implies (where P(x, y) := x—y) 


0 € int P(A(X) nN B(O,r)? — (C x M)N B(0,t)) 
C int (CN B(O,t) - Mn B(0,t)) C int(C — M). 


If we are interested only in knowing that C, 9M Attouch—Wets con- 
verges to CM M, and not in the actual rate of such convergence, then 
0 € int (C — M) certainly suffices for the applicability of the results of the 
theory to this end. Note however that in this case, the error bounds obtained 
in Section 4.7 are now not “computable” since we do not have explicit val- 
ues for the constants r, s and t appearing in the constraint qualification. A 
sufficient condition for 0 € int (C — M) may be obtained on modification of 
Lemma 7. 


Lemma 8. Let C, C™, C and M be as above, and assume that: 


1 there exists (€,u) € CN M such that 0 € core (C™ — a); 
20 € core(C© — M)); and 


3% has no zeros on the unit circle. 


Then 0 € int(C— M). 
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Proof. Note first that 
cone (ce SB) (Or i)) = cone (C — @) x cone (C™ — a) 


since C) — € and C™ — % are convex sets containing 0. Then (where T 
denotes the mapping introduced in Remark 1) 


cone (C — M) = cone ae (é,u)) — (M — (@,%)) 
= cone (C — @) x cone (C™) — a) — (M — (2, %)) 
= cone (C —@) xt 
—{(e,u) |e € M —@, dxe+n*xu=0} 
= cone (C — @) x  — {(e, Te) |e ¢e M — a} 
€ cone (CO — M) x 2, 


where the final inclusion is in fact an equality, as follows by noting that if 
(€,7) € cone (C) —M)) x1", then € € cone (C) — M)) = cone (C( —é)— 
(M(©) — @) so € = v—e for some v € cone (C“ —é) and e € M©) —@. Setting 
u:=7+Te €l' yields (€,7) = (v,u)—(e, Te) € cone (C\ —é) xl’ —{(e, Te) 
e € M©) — @}. Thus cone(C — M) = cone(C© — M®) x I', which by 
the assumptions yields 0 € core(C’— M) and the result then follows from 
Corollary 1. 


4.6.1 Limitations on the truncation scheme 


In Section 4.7, we will apply Corollary 3 to deduce various convergence re- 
sults. For this, it will be necessary that Cy := Co x om Attouch—Wets 


converges to C = C x C(™, where o™!) denote the corresponding trun- 
cations of C\). Recall that we need a condition of the form (4.17) for the 
application of the convergence theory of Section 4.5. This has an untoward 
consequence in relation to convergence of truncations of C™). From Lemma 9 
below, we see that Attouch—Wets convergence of C; to Cis impossible unless 
we keep Ge = C™) for all k; indeed, if truncations of C™ are included, 
then Attouch—Wets convergence will occur if and only if C, and hence C™), is 
locally compact (in the sense of having compact intersections with all closed 
balls), which is incompatible with the constraint qualification (4.17), as we 
shall observe in Lemma 10. 

Further, if instead we try to truncate the space M to form an expanding 
family of finite-dimensional subspaces M,,, then similarly, any Attouch—Wets 
convergence of M,, to M demands local compactness of M/, which is an im- 
possibility since the latter has infinite dimension. 

We therefore use truncations in the e-variable only, yielding the form Cy :-= 
cl xC™), Thus our truncations will generally not consist purely of elements 
(e, u) of fixed finite length. It will be shown currently however (see the end 
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of this section) that each C,, M is in fact contained in a finite-dimensional 
subspace, but the basis thereof may consist of infinite-length members (in the 
u-variable). If we wish for these truncations to contain only (e,u) of some 
fixed finite length dependent only on n, then further assumptions on the plant 
will be required (see Lemma 16). 

Lemma 9. Let X be a Banach space, let C, and C be closed convex subsets, 
with C, CC for all n, and C,, Attouch-Wets convergent to C. Assume also 
for eachn € N and p> 0 that C,,M B(0, p) is compact. Then CM B(0, p) is 
compact whenever CM B(0, p) is nonempty. 

Proof. It suffices to show that CNB(0, p) is totally bounded. Now 0 € int (C— 
B(0, p)) and hence B(0, s) C CN B(0, p+s)—B(0, p) for some s > 0 (see (4.17) 
with M := B(0,p)). On comparing (4.17) and (4.18) we find the indicator 
functions fn = 6c,, f = dc and g = d%(0,p) satisfy a condition of the form 
(4.8), so by Proposition 5, C,,7.B(0, p) Attouch—Wets converges to CN B(0, p). 
Let € > 0. By this convergence, there exists n such that CN B(0,p) C 
C, 1 B(0, p) + B(0,€/2). From the compactness of C;,  B(0, p), there exist 
1,...,0n in C,N B(O,p) C CN BO, p) such that UN, B(x:,€/2) contains 
Cr 1 B(0, p). Hence CN B(0, p) CU, B(ai, ©). 


Lemma 10. Suppose that C = C© x C™ is locally compact in the sense of 
Lemma 9 and nr has no zeros on the unit circle. Then 


0 ¢int(C— M). 


Proof. Supposing the contrary, Corollary 1 yields p > 0 satisfying cone (CM 
B(0,p) - M) = x I', which in turn implies that 


(V(E,n) € 2 x I)(de € MO — 2) (4.26) 
with €+e€cone(C“ A B(O, p) —é) 
and 1+Te€cone(C™) 9 B(0, p) — wa), 


where T is as in Remark 1, and (@,@) is a fixed member of CN B(0,p) 1M. 
Let x € I’. By the surjectivity of A: 1’ — R”™ given in (4.23), there exists 
€ € l' such that for each zero Z in D for f, €(Z) = X(Z)/d(Z). Place n := 
pie [0x - dé) /n]. Since ~ — dé now must have a zero at each (simple) zero 


in D for fi, and the latter has no zeros on the unit circle, we have 7 € I! by 
Lemma 4. Thus x = d* €+n*7. With e € M© —@ as in (4.26), it follows 
(on noting that d* e+n* Te =0 from the definition of T) that 


X= dxE+n*n 
€ dx cone (C A B(0, p) — @) +n * cone (C™ A B(O, p) — u) 
—(d¥e+nxTe) 
= cone (d * (C A B(0, p) — @)) + cone (n * (C™ A B(O, p) — w)) 
C cone [a (C© A B(O, p) — @) +n«(C™ A BO, p)—@)| , 
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where the latter inclusion follows since both C‘) 9 B(0,p) — € and C™ N 
B(0, p) — @ are convex sets containing 0. Since x € I’ is arbitrary, 


cone [a * (C 9 B(O, p) —@) +n¥ (CMO BO, p)— a] =U, 


so the compact convex set d+* (C© M B(0, p) — @) + n* (C™ Mm B(O, p) — ti) 
has a nonempty core (Definition 1) and hence by Proposition 1, a nonempty 
interior. However, this latter property is forbidden for any compact subset of 
an infinite-dimensional normed space. Thus we arrive at a contradiction. 


We end this section with the promised verification that each truncation 
CrAM = (ce x C(™) 9 M is indeed finite-dimensional. 
Lemma 11. Under the assumptions of this section, C, 1M is of finite di- 
mension for each k. 
Proof. Assume that CMM has a member (é, %) with é of finite length. Now 
let (e,u) € Ch M, and let K := max{k, I(é)}. Since dx(e—€)+n*(u—u) = 0 
and e—é € M) —@, then u— & = T(e—@). Since e—é € (M —@)nR*, 
we can write e—@ = 37%, aje) where the {e“}, is some spanning set for 
(M© — @ OR*. Placing u® := Te € 1", we obtain 


K K 
>. au = TO aye) =T(e-@) =u-t 
i=1 i=1 


so that u € @+span {u}4<,, and hence CO, NM C R* x (t+span {u}*,), 
a subspace of finite dimension. Note again that there is no guarantee that 
any of the u has finite length. 


4.7 Convergence of approximates 


As asserted in the opening paragraph of Section 4.6.1, if we wish to ap- 
ply convergence theory, we cannot simultaneously truncate in both e and uw. 
Accordingly, our truncations will always be of the form C, = ce) a hs 
The following two lemmas show that compactness of C') is essential for the 
Attouch—Wets convergence of the truncations ol) and C,,. 


Lemma 12. Let C, C!) and C™) be as usual, with also 
ar max{|A‘|, BO |} <+oo. Then 


dp(Cn,C) = S> max{|A‘|, |Bf° }}for any p > S~ max{|Al|, |BO]} 


izn i=0 


and C,, Attouch-Wets converges to C’. 
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Proof. Since ol) C C) for all n, we have 
dp(Cn,C) = 4,(C® x C™,C® x o™) = 4,(C@, C0) = e,(CO,C®), 


and we compute the latter. Let e ¢ C = C\ mM B(0,p). Then d(e, c)) = 
lle — €~m)|| = Linn les] where e(n) denotes the truncation to length n (that 
is, €(n) = (€0,---,€n—1,0,0,..)), and hence as n — co 


e(C, C2) = sup 7 le: = >> max{|A)|, (Bi? |} +0. 


e€cte) i>n 


ian 


Lemma 13. The set C‘ is compact in l' if and only if the bounds satisfy 


S_ max{|A{|,|BE|} < +00. 
1=0 


Proof. Let ig be such that Be) <0< Ale) for all i > ip. If C© is compact, 
then 2, := (AW), .., AS,0,0,..) € C (n > io) must have a convergent 
subsequence, along which we then have the uniform boundedness of the norms 
engl) = Oss |A|, and since these increase with k, Spee |A\ | is finite. 
Similarly, 77°, |Bi| is finite. 

Conversely, if 57725 max{|A‘|, |BO& |} < +00, the compactness of C) 
follows from Lemma 9 since its truncations Ce are all compact and Attouch— 
Wets converges to C°) by Lemma 12. 


The next lemma shows that CNM is always bounded whenever the bounds 
on e define sequences in /!. This ensures that condition (4.12) will always be 
satisfied for any objective f. 


Lemma 14. Suppose (as usual here) that ? has no zeros on the unit circle 
and that all its zeros in the unit disk are simple. Then CN M C B(0, po), 
where 


po = rns ors 4) BO |}, 
i=0 
K (in + [ldll: 7 max { 1A)", }) ! 


i=0 
where & is as in Lemma 4 and b:= w*d— nx (Bec + Bs). 


Proof. Let (e,u) € CAM, then from the relation d*¥e+n*u = 6, b— dé 


has zeros at each zero in D of A, and since u = Z~1 bode), Lemma 4 yields 


that |/ul|1 < «|b — d* el], < «(|lb|]1 + ||d|/1]|J/ell1). From this, and the relation 
lel Dse5 max{| A‘, BO |, the result follows. 
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Assembling all the parts we obtain our main result. 


Theorem 1. Let f,, and f be proper closed convex R-valued functions, with 
fn Attouch-Wets convergent to f and f > 0. Also, assume the following for 
C and M: 


1n has no zeros on the unit circle, and all its zeros in the unit disk are 
simple; 

2 The bounds (Be, Al) }00 | characterizing C form sequences in l', and 
also satisfy the requirement that for some K > m, 3€ € R* such that 
AE = b with BO < & < AM fori =0,1,...,K —1; and B® <0< A 
fori=>K; 

3 There exists (€,t) € CNM of finite length such that B(0,7@) C C™ —% 
for some positive fi with fi > k\|d|la[e(1 + aR") + |lElla + [lela], where x ts 
given in Lemma 4 and ax in (4.24); 

4 := max{sup,s,, info,am fn, inform f} is finite; 

5 B(O, un) C{f <v}N BO, A)—(CNM)NB(0,A) for some A, pw and v (that 
is, (4.19) holds). 


Define the constants 


pein, 114s — €:1,1Bs — Gl} (4.27) 


€ 


jj 


nH 
I 


. min {eax!, Z—k|ldllalel + og) + Wléllat+ lela} ( 

s:= min{s’, pw} (4.29 

r= max{e + |||: +35’, @+ |lulli +39’, w/2+A, v} ( 
, Bt |lalla + 2s", A}. ( 


t := max{e + |/E|], + 28’ 
Let po be as in Lemma 14. Then for any fixed p satisfying 
p > max {ar po, Y+1, max {14!, a} 
i=0 


and all n > no for which >) i5n max{|A‘|, |B |} <s and 


2r+2s+p e e 
dpt-s( fas f) + ———* } 7 max{|4, BP} < 8, 


i>n 


it follows that 


; ; (2r +84 p)(2r+2s +p) ‘ (e)) ) ple) \ 
= < : : 
le fn a f\s 82 a max {|4\ | Be | 


2r+s+p 
at — ots (fins f) ; 
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Proof. By assumption 1 and Lemma 6, we obtain an inclusion of the form of 
(4.17): 


B(0, a6) CC N BO, € + |léll1) — M 9 BOO, €(1 + a") + lléll1)- 
This, along with assumptions 1 and 1, may be inserted into Lemma 7 to yield 
B(0,2s') CCN BOO, maxfe + |[€|l1, + llalla}) -— M. 

Lemma 5 then gives (where r’ := max{e + ||E||1, 2+ ||a|1}) 
B(0, 8’) C AN B(0,r’ + 3s’) — (C x M)N B(O,r’ + 28’), 


which, along with (4.20) (a consequence of assumption 1 via Lemma 5) yields 
(4.9) and (4.10) (in Corollary 3) for r, s and t as above. 

Further, assumption 1 gives (4.11), and (4.12) follows from Lemma 14 
with the indicated value for po. Noting the explicit form for d,(C,,C) in 
Lemma 12 (for p > 0729 max{|A(|, |BO|}), the result may now be read 
from Corollary 3. 


In particular, if f, = f for all n, we see that 


: (2r + s + p)(2r + 28 + p) (c) 
Puss f pod f\s s2 d. ares { lal 


is 


However, in this case, we can obtain better constants by using Corollary 4 in 
place of Corollary 3, which will require the condition (4.21) (or (4.14)). To 
illustrate, if f is like a norm, say, f(e,u) := |lell1 + ¢|/ull1 (¢ > 0), then for 
any r >0, {f <r} D B(0,r/(2max{1, ¢})) so if (4.17) holds (for some s, o) 
then taking r = 2(s + a) max{1,¢} gives 


pe 


Cn B(0O,c) —{f <r}NAMN B(0,o +8) 
2 CN B(0,c) -MN B(0,0 + s) D B(O,s) by (4.17) 


so (4.21), and hence (4.14), holds for suitable constants. Accordingly, we 
arrive at the following result. 


Theorem 2. Let f : l' x I’ — R have the form f(e,u) := |lell1 + Cllulla 
for some ¢ > 0. Assume Conditions 1, 1 and 1 of Theorem 1 hold, and 
Condition 1 is replaced by 


7 := inf f<co forsome ng. 
Casa 


Define € as in (4.27), and s := s' by (4.28), with 


r:= 2max{1,¢} (max{e+ ||€|]1, @+ |/ul]1} + 2s) and (4.32) 
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t:= max{e+t ||€|]1 + 2s, Z+ ||a|l1 + 2s}. (4.33) 
Then, for any fixed p satisfying 
p> max 2-8 my 9/41, Soom {ah lH} 
i=0 


(where po appears in Lemma 14) and alln > no for which 


S-max{|A‘|, |BO |} < s, 
i>n 
we have 
: : 2r+st+p (e) (e) 
i pitt Fee {14|, BO}. 
ont, f ant fy} Ss = 2 [Ae |, [Be | 


Proof. This follows along similar lines to that for Theorem 1, but uses the 
last displayed relation before Theorem 2 to obtain (4.21) from (4.17), so that 
(4.14) is obtained for the above r, s and t and we may apply Corollary 4. 


In summary: we have obtained 


e that infonay f provides the exact lower bound for the performance of ra- 
tional controllers for the /' control problem, and 

e computable convergence estimates for the approximating truncated prob- 
lems. 


Note further that these results are obtained for the case where the hard- 
bound set C (or time-domain template) has no interior (since C( is assumed 
compact and hence has an empty interior). This then extends a result of [14] 
on such approximations (in the particular two-block control problem we con- 
sider) to cases where int C' is empty. We note however that the cited result 
from [14] in fact has an alternate short proof by an elementary convexity argu- 
ment (see Lemma 15 below) once the density in M of the subset of members 
of finite length is demonstrated. (This density is established, in the general 
“multi-block” case, in [19]. For the special two-block setup we consider, this 
property is proved in [26].) The above convergence results should be readily 
extendible to the multi-block formalism of [14]. 


Lemma 15. Let C, M and Mo be convex sets, with M = Mo and (int C)N 
M #0. ThenCNM=CN Mp. 


Proof. Let x € CNM and ao € (intC) MM. Then for each 0 < X\ < 1, 
Ly := AX + (1 — Aja € (int C) NM, and x) — «x for A — 0. For each 4, the 
density of Mo yields approximates to x, from Mo which, if sufficiently close, 
must be in (int C) N Mo. 
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This argument in fact leads to a very quick proof of [14, Theorem 5.4] 
(whose original proof contains a flaw, detailed in [27]) which asserts the 
equality infeqw || - ||; = inform, || - ||1 under a slightly stronger assumption, 
which in fact implies nonemptiness of (int C)M M. To see the applicability of 
Lemma 15 to the cited result from [14], we temporarily adopt the notation 
thereof. Assign C' and M as follows: 


C= {P € enke | Atemp® < Dtemp } and 
M oa {P € Lene | AfeasP = Dteas } 5 


where biemp € 0°, Breas € RO x fe", Atemp 2 Ef2""" — 1° and Aeas': 
e*"~ = R& x I~" are bounded linear, and where the symbol < stands 
for the partial order on [© induced by its standard positive cone P*. The 
assumption of [14, Theorem 5.4] is that Diemp — Atemp®o € int P* for some 
&y € M. However, the continuity of Atemp implies that &p € int C and hence 
®o € (int C)N M, which is the assumption of Lemma 15. This, coupled with 
the density of Mo := M, in M, gives the result. 

As was discussed in Section 4.6.1, the approximating problems are con- 
structed by truncating in the e-variable only, otherwise the method fails. 
However, the truncated constraint-sets Cy, 1M satisfy 


dim spanC;, 0 M < 2k (for large k) 


by Lemma 11, whereby the approximating minimizations are indeed over 
finite-dimensional sets. Since in general (e, uv) € Cy, M does not imply that 
u has finite length, one may ask under what conditions it may be possible for 
there to exist, for each k, a uniform bound m(k) to the length of u whenever 
(e,u) € C. NM. In this case, the truncated sets Cy, M would resemble those 
obtained by the more natural method of simultaneously truncating both e 
and u (a strategy that fails to yield convergence, in general, by the results 
of Section 4.6.1). From the lemma to follow, such a property can hold only 
when the plant has no minimum-phase zeros (that is, no zeros outside the unit 
circle). Hence, except for some highly restrictive cases, Cy, M will always 
contain some u of infinite length. 


Lemma 16. Suppose that the assumptions of Theorem 1 hold, and assume 
further that i has no zeros outside the open unit disk. (As usual, all zeros 
are assumed simple). Let (€,t) be in CNM with both é and & having finite 
length. Then for any k and any (e,u) € Ch M, 


I(u) < max{l(u), I(d) — Un) + max{l(é), k}}, 


where I(-) denotes length. 

Moreover, assuming that the conditions of Remark 4 hold (with Ale) > 
0> Bo for i > l(é)), then if for all k, each (e,u) € Cy M has wu of finite 
length, it follows that i cannot have zeros outside the unit disk. 
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Proof. Let (e,u) € Cy M and let (e1,u1) := (e,u) — (€,t%). Then I(e,) < 
max{l(e),/(é)} and similarly for uj. Also uy = —d* Z7'(é,/f) from the 
corresponding convolution relation defining M. Since é is a polynomial hav- 
ing zeros at each zero of fi in the unit disk and hence the whole plane (re- 
call e, € M© — @), we have that & is a polynomial of degree at most 
i(d) —U(n) +l(e1) —1 < U(d) —1(m) + max{l(e), 1(é)} — 1, and so, as u = u +4, 
the result follows. 

For the second assertion, suppose (zo) = 0 for some |zo| > 1. With (é, a) 
as above, it follows from the closed-loop equation d*é+n«xt = w*d—n* =: b 
(where 3 := 6.c + Bss) that (zo) is finite and €(zo) = (20). Let k exceed 
both the number of constraints defining M (°) and the length of é, and let 
e€ MOOR*. Ifu:= 27! (#), the interpolation constraints on e ensure 


that @ has no poles in the closed unit disk (so u € I) and hence (e,u) € M. 
Also, since from the assumptions, cone (C( —@) D R* and cone (C —a) = 
I’, we have 
€ (M — (@,@))N(R* x 1) 
C (M —(2,a))n (cone (C0 — @) x cone(C™ — u)) 
= (M — (@,%)) MN cone (Cy — (€,%)) 
(recalling that Cy := cf sO) 
= cone (Cx, M — (é,%)), 
whence A((e, uv) — (€,u)) € Ch A M — (@,t%) for some positive \. Since now 


A(e,u) + (1 — A)(E,u) € Cy NM, the hypothesized finiteness of length of 
Au + (1 — A)t implies, via the equation 


dx (Ae + (1 — A)é) +n * (Aut (1 — A)t) =wxd—n* B, 
that (Ae + (1 — A)é)~ (20) = W(20) and hence é(z9) = w(z0). Since k is arbi- 


trary, we have shown that every finite-length e € M) satisfies an additional 
interpolation constraint €(z)) = w(zo) at zo, which yields a contradiction. 


Remark 6. Under the assumptions of the preceding lemma, note that for k > 
max{l(é),/(a) — 1(d) + l(n)}, and (e,u) € Ch NM, we have I(u) < k+I(d) — 
i(n) := k+l, so for such k, CM consists precisely of those elements (e, wu) 
of CA M with e of length k and u of length k + 1. 

If 1 <0, then 


CrhAM = {(e,u) Ee CNM | lle) <n, Wu) <n} := Qn. 
If 7 > 0, then for all n > max{I(é), 1(a%) — U(d) + l(n)}, 


CrOM SC Qntl Cc Cn4inM, 
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and hence Q, Attouch—Wets converges to CM M, so from Corollary 3, 

inf f, — inf 

Qn fn CoM f 


as n — oo. Observe that the sets Q,, represent truncations of CM M to the 
same length n for both e and u. 


4.7.1 Some extensions 


So far we have considered CQ of the form of an interiority 0 € int (C — /), 
leading, via Proposition 5, to the determination of rates of convergence for 
our truncation scheme. If the CQ is weakened to the strong quasi relative 
interiority 0 € sqri(C — M) (meaning cone (C' — M) is a closed subspace), it 
is not immediately possible to apply Proposition 5. In this case, then, explicit 
convergence estimates may not be obtainable. We may still, however, derive 
limit statements of the form infco,qaw f — inform f for reasonable f. To 
achieve this, we use an alternate result [13] on the Attouch—Wets convergence 
of sums of functions, based on a sqri-type CQ, but unfortunately, this result 
will not provide the estimates obtainable through Proposition 5. 

We proceed by first establishing the Attouch—Wets convergence of C, 9 
M — COM using [13, Theorem 4.9]. In this context, the required CQ is: 
(1) 0 € sqri(C — M) and cone (C — M) has closed algebraic complement 
Y; and (2) that Y N span (C, — M) = {0} for all n. Note that (1) implies 
(2) since C,, C C for all n, so we need only consider (1). The following two 
lemmas provide a sufficient condition for 0 € sqri(C — M). 


Lemma 17. Suppose that: 


10 €sqri(C — M), 
20 € core(C™) — &) for some (é,ti) € CN M; and 
3% has no zeros on the unit circle. 


Then 0 € sqri(C — M) and cone (C'— M) has a closed algebraic complement. 


Proof. By arguing as in Lemma 8, cone (C' — M) = cone (C“) — M()) x 1. 
Thus it forms a closed subspace, so strong quasi relative interiority is es- 
tablished. Since M) — @ is a subspace of finite codimension in I', the 
complementary space Yo to cone(C°) — M()) is finite dimensional and 
hence closed. Clearly then cone(C' — M) has the closed complement Y := 
Yo x {0}. 


From Lemma 17 the problem is reduced to finding conditions under which 
0 € sqri(C — M). 
Lemma 18. Let @ € I! and C®) = {fe € I | BO < 6, < AM; i = 
0,1,2,...}, where the bounds BO) and Ale) satisfy: 
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1B <@ < A® for allie N; 

2D, AY?| and YD, |B? < 00; 

3 for a subsequence {ix}?29 we have He Se) A, and for alli not in 
this subsequence, Bo <j < Al), 

Then 0 € sqri(C’) — M)), 


Proof. After projecting into a suitable subspace of I+, we shall follow the 
argument of Lemma 6. Let P denote the projection on /' onto the subspace 
consisting of sequences vanishing on {i,}?2,- That is, (Pe); := 0 for i € 
{in}, and (Pe); := e; otherwise. Evidently P is continuous and maps closed 
sets to closed sets. Next, observe that (I — P)cone(C) — @) = cone (I — 
P\(C&) — @) = {0}, since if e € (C%) — @) then for all k, we have e;, € 
[BY — &,, AS? — &,] = {0} so e, = 0, yielding (I — P)e = 0 by the 


definition of P. Thus we obtain 
cone (C“) — M)) = cone P(C — M) 4+ (I-— P)(MO —@). (4.34) 
If we can show that cone P(C — M)) = Pl" then (4.34) would give 


cone (Cc) - M)) =Pi+(I-= P)(M©) —2) 
= Pl'+ P(M® — 2) +(M —@) 
=Pi+ (M) —@) 


which must be closed since Pl! is closed and the linear subspace M) — é 
has finite codimension. 

We now verify that cone P(C) —M)) = Pl". (This part of the argument 
parallels that of Lemma 6.) Let € € Pl’, with 

Illa <agte= age jo amin IAI? — LIB)? — Sil} > 0. 

There exists 7 € R® C1! such that ||7||1 < ax|lél|1 < € with An = AE. Then 
for i ¢ {i,}, we have Bo <€ 4+ < Ale whenever i < K, and for i > K, 
etm = & € (Bo, A), whence P(n + é) € PC (or Pn € P(C™ — @)). 
Also, 7 — € € M®) —@ as A(n — £) = 0, implying Pn — & = Pn — PE € 
P(M© — é). Thus 


.= P= (Piss) 
e P(C™ —e@) —P(M® —@ =P(C — M), 


This shows that B(0,az'e) N Pl’ C P(C&) — M©)) whence 
cone P(C — M()) = Pl! as required. 


Corollary 5. Under the conditions of Lemmas 17 and 18, and for any convex 
closed real-valued f :U' x l' +R, 
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li inf f= inf f. 
en ene 
Proof. By the lemmas (and the cited result from [13]) the indicator functions 
dc,nam Attouch—Wets converge to dcqm, and (since dom f has an interior) 
by any of the cited sum theorems, f + 6¢,nu — f + dcnm also. The result 
then follows from Proposition 4. 


Our formulation of the control problem (see Section 4.4) was chosen to 
permit its re-expression as an optimization over I! x 1'. This choice resulted 
from a wish to compare with the other methods we described in the in- 
troduction, which used duality theory. Accordingly, we sought to formulate 
minimizations over a space such as |!, which has a nice dual (namely, /°). 
However, this is not the only problem formulation that can be treated by the 
methods of this paper. 

Recall from Section 4.4 that we considered only stabilizing controllers 
K € S(P) for which the resulting control signal u was restricted to the 
subspace X, C 1%. This requirement will now be relaxed to u € 1°. This 
will be seen to entail only trivial changes to the main results of this section. 
(Note that in this case the resulting optimization is taken over I+ x [%°, so 
duality methods would not be readily applicable, since the dual (/°°)* has a 
complicated characterization.) 

The basic feasible set is now of the form 


Fo = {(e,u) €' x I |e =e(K)for some K € S(P) u= K xe} 


where wu is free to range through [°° instead of its subspace Xq. If we define 
(where w € 1° and has rational z-transform) the altered sets 


(e,u) El x IM | 
é(p;) = 0 (p; pole of P: |p;| <1) i=1,..,mi 
M := § €(2;) = w(%;) (2; zero of P: |Z;| <1) j=1,..,me >, 
(G,) =0 (G, zero of w: |t,| <1) k=1,..,ms 


M, := {(e,u) € M | é,a@ rational, e 4 0}, 


the argument of Lemma 2 again yields WM, = Fo. Accordingly, we may now 
frame a minimization problem over the subset CM M of l' x I where C = 
Cc) x C™ but now 


CH = {uel | B® <u; < AM for allie N} CI™. 


For simplicity, we consider only the required changes to Theorem 2, since this 
result has a simpler statement than Theorem 1. In fact, with the assumptions 
of Theorem 2, but with f : 14 x 1° — R given by f(e, u) = |lel|1+C||ull.o, and 
all references to ||d||, converted to ||d||.., the form of Theorem 2 is unaltered, 
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except that the po of Lemma 14 is not available, but another value may be 
used (namely, po = (7/ + 1) max{1,¢~+}). 


Theorem 3. Let f : I’ x I — R have the form f(e,u) := |lell, + Cllulloc 
for some ¢ > 0. Assume Conditions 1, 1 and 1 of Theorem 1 hold (where 
the interiority in Condition 1 is relative to the 1°-norm) and Condition 1 is 
replaced by 
o/:= inf f <oo for some no. 
ng IM 


Define € as in (4.27), with 


1 
aa 5 min {aK 6 H— w|ldljafe(1 + ag’) + [élla + Wella] 


r= max, ¢} (maxfe + [lélla, H+ llalloo} + 2s) and 
t := max{e + |/E|]1 +25, D+ |lUloo + 25}. 


Then for any fixed p satisfying 


p> mx {2-4 (+1) max{1,¢~4}, Sn {1411201} 


i=0 


and alln > no for which 


S- max{|A{?|, [BO |} < s, 


i>n 


we obtain 


: : 2r+s+p - 2 
iB t Bid] SG AD BML 
i>n 
Proof. Since the form of the convergence theory of Section 4.5 (and the as- 
sociated CQs) is insensitive to the particular norm used, all we need to do 
to obtain a proof in this context is to reinterpret all statements in u-space 
as referring instead to the norm || - ||... The only places where changes occur 
are in Lemma 7, where all references to ||t||; become ||z||..; and instead of 
using po from Lemma 14, we exploit the form of f to ensure the existence of 
some pp for which Condition (4.15) of Proposition 4 is valid. 


4.8 Appendix 


Remark 7. (A comment on z-transforms) 
Let w = {wn}, be a sequence with a rational z-transform w#. On a small 
neighborhood of 0 in the complex plane, this is given by a power series 
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ier Wrz" for such z. However, by rationality, w has extension to the whole 
plane by its functional form, and so is extended beyond the domain of con- 
vergence of the power series. 

For example, if w = (1,1,...), then for |z| < 1, ®(z) = 372.9 2” = 1/(1-2), 
but the function taking z to 1/(1— z) is defined on the whole plane except at 
1 and so constitutes an extension in the above sense. We shall always assume 
that such transforms have been so extended. 


Proof of Lemma 1: Let (e,u) € Fo and write u = t+ u-c+ uss and 
w = w+w-ct+wss, with & and @ in /'. The ensuing relation d*e+n*u = dew 
implies that 


(wed — uch)é + (wed — ush)8 = dé + Au — wd € 


and so cannot have any poles in the closed disk D. Since 


it follows that 


(wed(z) — ucit(z))(1 — zcos 8) + (wsd(z) — ust(z))z sin @ 
(z — a)(z —@) 
has no poles in D and hence none at all in C, and must be a polynomial (so 


that dx (w.c+w,s) —n*(u-c+uss) has finite length). If now @ = 0, so s = 0, 
the above amounts to stating that 


wWed(z) — uchi(z) 


1l—z 
is polynomial, from which we obtain ue = wed(1)/A(1) = we/P(1). Sim- 
Wed—Uch 
A Ts 
Ue = w-d(—1)/nr(—1) = w./P(-1). For other values of 6, similar reason- 
ing implies 


ilarly, if @ = m (so again s = 0), we have polynomial, so that 


(w.d(a) — ug(a))(1 — acos 6) + (wsd(a) — usft(a))asin 6 = 0. 


Rerranging terms and taking real and imaginary parts yields 


wsasin 0+w-.(1—acos 8) 
Pla) and 


us cos @sin@ + u,sin?@ = Re 


wsasin 0+w.(1l—acos 0) 
P(a) , 


us sin?6—u,-sin@cos@ = Im 


with the right-hand side vanishing if P has a pole at a. Solving this for u, 
and us gives the desired relation. 
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Proof of Lemma 2: It suffices to show that M;. C F, since the reverse 
inclusion is easy to demonstrate. Let (e, vu) € M,. Define R := 9/i—é/(iwdd). 
Then R # 9/A since e £ 0. If we can prove that R € Roo then é = wd(g— Ri) 
and from the convolution relation, @ = @.¢+ 8.8 + wd(@ + Rd) and thus 
(e,u) € F. 

Now, in the closed unit disk D, the only candidates for poles of R are the 
poles/zeros of P and the zeros of w. The proof now proceeds by successive 
elimination of these possibilities. 

If p is such a pole (so d(p) = 0 and f(p) ¥ 0), then, if p is not a pole for wv, 
w(p) is finite and nonzero (from the assumptions on pole/zero positioning), 
so the nonsingularity of R at p follows from that of é/ d, which itself is implied 
by the interpolation constraint é(p) = 0, whereby é has a zero at p cancelling 
the simple zero at p for d. If p is also a pole of (so p = a or p = G) then 
wid(-) is finite and nonzero at a and G, so R has no pole there, regardless of 
the value of é(a). 

If Z is a zero in D for P (so d(z) # 0 and BO) = 0), then again W is 
finite and nonzero here. Now R is expressible as (1 - £) - 7 at least in a 
punctured neighborhood of Z, where we used the nsiation on + gd = = 1. The 
interpolation constraint é(Z) = w(2 ) # 0 means that 1 — % has a zero at 
z, cancelling the simple zero of fd there. Again the sigilarity of R here is 
removable. 

If & is a zero of w, it is neither a pole nor a zero of P (by assumption) so 
A(v) and d(v) are both nonzero. The constraint é() = 0 then implies that 
é/w, and hence R, has a removable singularity there. Thus R has no poles in 


D as claimed. 


The proof of Lemma 4 will follow from the next two elementary lemmas. 


Lemma 19. Let fe i(C) ) and let |a| < 1 be a zero of f. Then f(-)/(-—a) € 
i1(C) ) and is in B if f €U andaeR. 


Proof. The indicated function is analytic on a neighborhood of 0 in the plane, 
so gq := Z71[f(-)/(- — a)] exists as a complex sequence. Assume now that 
a # 0 (the argument for a = 0 is trivial and hence omitted). Then since 


St a ee ee 
Qe = — geet Dy-0 Sie? = gest Dyse hyo? since f(@) = 


Then, where an interchange of the order of summation occurs below, 


cae | oe 1 
lalla < » 4 [a 2 [fille = om |f;llal’ S- ja/eet 
= j j=l k<j 


Ill 
<D lal? e il <a) < =a 
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Lemma 20. If f € [(C) and |a| > 1, then 2 €11(C) andisinfi if feb 
andaeR. 


Proof. As in the preceding lemma, let q be the inverse transform of the func- 
tion in question. From the expression for gq, given there, and an interchange 
of summations, 


lal < So Lfillal es = Do Wille? — 2 Me, 
j=0 ay | 5=0 lal (1 i ) ' 


la] 


Proof of Lemma 4: By Lemma 20, no generality is lost by assuming that 
p has no zeros outside the open unit disk. Write p(z) = C [],(z— ai)(z — a) 
where a; and @; are the zeros of p, with the understanding that for real a; 
we only include the single factor z — a; in the above product. Also, we allow 
the possibility that the a; are nondistinct. Now, as f(a1) = 0, Lemma 19 
implies that f(-)/(- — a1) € [1(C). If ay is real, then the function is in /!—if 
complex, then a; # G@, and since f(@1) = 0, so f(-)/(- — a1) has a zero at a 

£0) 11) ji 
—ai)(--a 
is a symmetric function of z. Continue inductively to complete the proof. 

Proof of Lemma 3: Let bj,..,b) € R and aj,..,a, € C\IR denote the 
collection of poles/ zeros of P in D, the zeros of w in D, along with the zeros 
of P outside D. From the assumed distinctness of these, it follows that the 
corresponding square matrix 


and by Lemma 19 again, C ns 14(C) and hence is also in I since it 


itp, oe eee 


1 “Bg. 23 ae 
1 Re a, -:- Re a? 72+ 


1 Re aq --- Re ab +?4-? 
+2q—1 
0 Im a; --- Im aj} z 


0 Im a, --- Im Geral 
has full rank over R. 

Now, note that if 7 ¢ D is a zero of P, then w(Z) is finite. To see this, 
note that d(z) 4 0, so by (2), (w — wec — wes) has no pole at Z, and hence 
neither has w, since Z equals neither a nor its conjugate. Thus w(a;) and w(b;) 
are all finite. By the surjectivity of the above matrix, and constructing an 
appropriate real (p+ 2q)-vector from w(a;) and w(b;), we can find a nonzero 
e € l' of finite length such that all the interpolation constraints of M are 
satisfied, with, furthermore, 


é(Z) = w(Z)at each zero Zz ¢ D of P. 
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We now seek u € I of finite length such that 
thd — A(Beé + Bo8) — 
n 
wd + {d(weé + we8) — Ai(Bo@+ 8,8) } — dé 


n 


a= 


where & := w — w.c — wss. From the definition of 3. and @, (see Lemma 1), 
it follows that d(weé + wWsS) — (3-6 + G58) has no pole at a (nor at @) and 
so must be polynomial, and hence the numerator in the above expression for 
a is polynomial, since iid and é are. To show that & is polynomial, we only 
need to see that the numerator has a zero at each zero of 7. (Recall that 
we assumed all these rational transforms to be extended by their functional 
form to the whole complex plane, as in Remark 7). Indeed, let Z be a zero of 
nh (that is, of P). Then z # a, @ and é(Z) = wW(Z), and since ¢(Z) and §(Z) are 
both finite, the numerator evaluated at Z is 


(2)d(Z}-d(z)(wee(z) + w58(Z)) + A(Z)(Be(Z) + B.8(Z)) — ez) A(z) 
= to(2)d(z) — d(z)(wee(z) + ws8(2)) — (2) dz) 
= (18(2) — (2))d(z) =0 


as claimed. Thus wu has finite length. 
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Chapter 5 
Asymptotical stability of optimal 
paths in nonconvex problems 


Musa A. Mamedov 


Abstract In this chapter we study the turnpike property for the nonconvex 
optimal control problems described by the differential inclusion « € a(x). We 
study the infinite horizon problem of maximizing the functional ie u(a(t)) dt 
as T grows to infinity. The purpose of this chapter is to avoid the convexity 
conditions usually assumed in turnpike theory. A turnpike theorem is proved 
in which the main conditions are imposed on the mapping a and the function 
u. It is shown that these conditions may hold for mappings a with nonconvex 
images and for nonconcave functions wu. 


Key words: Turnpike property, differential inclusion, functional 


5.1 Introduction and background 


Let  € R” and 2 C R” be a given set. Denote by I.(R”) the set of all 
compact subsets of R”. We consider the following problem: 


ze a(x), «(0)=2°, (5.1) 


p 
Jr (x(-)) = jf ucoteat — max. (5.2) 
0 


Here x° € 2 C R” is an assigned initial point. The multivalued mapping 
a: 2 — IT,(R”) has compact images and is continuous in the Hausdorff 
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metric. We assume that at every point x € 9 the set a(x) is uniformly 
locally-connected (see [3]). The function u: 2 — R? is a given continuous 
function. 

In this chapter we study the turnpike property for the problem given by 
(6.1) and (5.2). The term ‘turnpike property’ was first coined by Samuelson in 
1958 [16] when he showed that an efficient expanding economy would spend 
most of its time in the vicinity of a balanced equilibrium path. This prop- 
erty was further investigated by Radner [13], McKenzie [11], Makarov and 
Rubinov [7] and others for optimal trajectories of a von Neuman-Gale model 
with discrete time. In all these studies the turnpike property was established 
under some convexity assumptions. 

In [10] and [12] the turnpike property was defined using the notion of 
statistical convergence (see [4]) and it was proved that all optimal trajectories 
have the same unique statistical cluster point (which is also a statistical limit 
point). In these works the turnpike property is proved when the graph of the 
mapping a does not need to be a convex set. 

The turnpike property for continuous-time control systems has been stud- 
ied by Rockafellar [14], [15], Cass and Shell [1], Scheinkman [18], [17] and 
others who imposed additional conditions on the Hamiltonian. To prove a 
turnpike theorem without these kinds of additional conditions has become a 
very important problem. This problem has recently been further investigated 
by Zaslavsky [19], [21], Mamedov [8], [9] and others. The theorem proved in 
the current chapter was first given as a short note in [8]. In this work we give 
the proof of this theorem and explain the assumptions used in the examples. 


Definition 1. An absolutely continuous function «(-) is called a trajectory 
(solution) to the system (6.1) in the interval [0,7] if 2(0) = 2° and almost 
everywhere on the interval [0,7] the inclusion x (t) € a(a(t)) is satisfied. 

We denote the set of trajectories defined on the interval [0,T] by Xr and let 


Jp = sup IJr(zx(-)). 
a(.)EXp 


We assume that the trajectories of system (5.1) are uniformly bounded, that 
is, there exists a number L < +o0 such that 


|x(t)| < L for allt € [0,T], 2(-)€ Xv, T > 0. (5.3) 


Note that in this work we focus our attention on the turnpike property of 
optimal trajectories. So we do not consider the existence of bounded tra- 
jectories defined on [0, co]. This issue has been studied for different control 
problems by Leizarowitz [5], [6], Zaslavsky [19], [20] and others. 


Definition 2. The trajectory x(-) is called optimal if J(x(-)) = J} and is 
called €-optimal (€ > 0) if 
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I(x(-)) 2 Jp - €. 
Definition 3. The point z is called a stationary point if 0 € a(x). 


Stationary points play an important role in the study of the asymptotical 
behavior of optimal trajectories. We denote the set of stationary points by 


M: 
M={xEN: V0Ea(a)}. 


We assume that the set M is bounded. This is not a hard restriction, because 
we consider uniformly bounded trajectories and so the set 2 can be taken as 
a bounded set. Since the mapping a(-) is continuous the set M is also closed. 
Then M is a compact set. 


Definition 4. The point x* € M is called an optimal stationary point if 


In turnpike theory it is usually assumed that the optimal stationary point 
x* is unique. We also assume that the point x* is unique, but the method 
suggested here can be applied in the case when we have several different 
optimal stationary points. 


5.2 The main conditions of the turnpike theorem 


Turnpike theorems for the problem (6.1), (5.2) have been proved in [14], [18] 
and elsewhere, where it was assumed that the graph of the mapping a is a 
compact convex set and the function u is concave. The main conditions are 
imposed on the Hamiltonian. In this chapter a turnpike theorem is presented 
in which the main conditions are imposed on the mapping a and the function 
u. Here we present a relation between a and u which provides the turnpike 
property without needing to impose conditions such as convexity of the graph 
of a and of the function u. On the other hand this relation holds if the graph 
of a is a convex set and the function u is concave. 

Condition M There exists b < +00 such that for every T > 0 there is a 
trajectory x(-) © Xp satisfying the inequality 


Jr(a(-)) > u*T — b. 


Note that satisfaction of this condition depends in an essential way on the 
initial point x°, and in a certain sense it can be considered as a condition 
for the existence of trajectories converging to «*. Thus, for example, if there 
exists a trajectory that hits «* in a finite time, then Condition M is satisfied. 
Set 
B={xeEN: u(x) >u*}. 
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We fix p € R", p #0, and define a support function 


c(x) = max py. 
yea(x) 

Here the notation py means the scalar product of the vectors p and y. By |c| 
we will denote the absolute value of c. 

We also define the function 

ele,y) = cas + tay 

Condition H There exists a vector p € R” such that 

H1 c(z) <0 forall cE B, cx"; 

H2_ there exists a point & € Q such that px = px* and c(z) > 0; 

H3 for all points x,y, for which 


px=py, cx) <0, ely) >0, 
the inequality p(x, y) <0 is satisfied; and also if 
Cpa, Yroy #a*, pre=pyr, C(te) <0 and c(yx) > 0, 
then limsup,_,., P(@k, Yr) <0. 


Note that if Condition H is satisfied for any vector p then it is also satisfied 
for all Ap, (A > 0). That is why we assume that ||p|| = 1. 

Condition H1 means that derivatives of the system (6.1) are directed to 
one side with respect to p, that is, if « € B, x # x*, then py < 0 for all 
y € a(x). It is also clear that py < 0 for all y € a(a*) and c(a*) = 0. 

Condition H2 means that there is a point £ on the plan {x € R” : p(x — 
x*) = 0} such that py > 0 for some y € a(Z). This is not a restrictive 
assumption, but the turnpike property may be not true if this condition does 
not hold. 

The main condition here is H3. It can be considered as a relation between 
the mapping a and the function wu which provides the turnpike property. 

Note that Conditions H1 and H3 hold if the graph of the mapping a is a 
convex set (in R” x R") and the function uw is strictly concave. In the next 
example we show that Condition H can hold for mappings a without a convex 
graph and for functions u that are not strictly concave (in this example the 
function wu is convex). 


Example 1 Let x = (x1, 72) € R? and the system (6.1) have the form 
t= Axi +(#}+1)?+u], -1<w<+l, 
IQ= f (x1, £2, 0), vEUcR™. 


Here A > 0 is a positive number, the function f(x1,x2,v) is continuous and 
f(0,0,0) = 0 for some 0 € U. 
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The mapping a can be written as 


a(x) = {y = (Yi, ya) : y= Asi +(e} +1)? + wl, yo = f(r1, 22,0), 
g= (21,02) € R?, -—1<w<4+l1, veU}. 


The function u is given in the form 
u(x) = cra+da2?*, where d>0, c> 2d, k€ {1,2,3,...}. 


We show that Condition H holds. 

It is not difficult to see that the set of stationary points M contains the 
point (0, 0) and also M Cc B,(0,—1), where B,(0,—1) represents the sphere 
with center (0,—1) and with radius 1. We have 


u" = max u(x) = u(0,0) = 0. 


Therefore z* = (0,0) is a unique optimal stationary point. 

We fix the vector p = (—1,0) and calculate the support function c(x) = 
MaXyca(x) PY : 

e(x) = —d(a? + 23 + 220). 

Take any point « = (1,22) € B = {x : u(x) > 0} such that « 4 
x* = (0,0). Clearly « ¢ B,(0,—1) and therefore c(a) < 0. Then Condition 
H1 holds. Condition H2 also holds, because, for example, for the point % = 
(0,—1) for which pz = 0 we have c(#) = A > 0. 

Now we check Condition H3. 

Take any two points x = (21,22), y = (y1, y2) for which px = py, c(x) < 0, 
c(y) > 0. 

If u(x) < 0, from the expression of the function (2, y) we obtain that 
(x,y) <0. Consider the case u(x) > 0. 

From px = py we have x1 = y1. Denote € = x = yj. Since c(y) > 0 and 
A > 0 we obtain €? + (yo +1)? < 1. Therefore 0 < € < 1 and yg + (1/2) <0. 
On the other hand 


u(y) — u* = cyo + dé?" < cyy + dé? < clya + (1/2)E?). 
Since c(x) < 0 and u(x) > 0 then |ce(x)| = A(€2+.43+222), vo+(1/2)€?7 + 
(1/2)a3 > 0, 
u(x) — u* = cag + dé?* < cag + dé? < cle + (1/2)E7]. 
Thus 
u(x) — u* . c(x2 + (1/2)E7) Ee: 
\c(a)| A(E? + 3 + 2x2) ~ 2X’ 


u(y) = ut ely + (1/2)6*)_ _ _€ 
cly —AE2 + ys + 2y2) ~ 2d" 
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From these inequalities we have y(a,y) < 0, that is, the first part of H3 
holds. The second part of Condition H3 may also be obtained from these 
inequalities. Therefore Condition H holds. 

We now formulate the main result of the current chapter. 


Theorem 1. Suppose that Conditions M and H are satisfied and that the 
optimal stationary point x* is unique. Then: 


1. there exists C < +00 such that 


T 
fue) —u*)dt < C 
0 


for every T >0 and every trajectory x(t) © X7; 
2. for every ¢€ >0 there exists Kee < +00 such that 


meas {t € [0,7]: ||x(t)—a*||>e} < Kee 


for every T >0 and every £-optimal trajectory x(t) © X7; 
3. if x(t) is an optimal trajectory and x(t,) = a(t2)=2*, then x(t) = 2* 
for t € [t1, te]. 


The proof of this theorem is given in Section 7, and in Sections 3 to 6 we 
give preliminary results. 


5.3 Definition of the set D and some of its properties 


In this section a set D is introduced. This set will be used in all sections 
below. 
Denote 
M* = {x € 2: c(x) > Of. 


Clearly M Cc M*. We recall that B= {w € 2: u(x) > u*}. 
Consider a compact set D C §2 for which the following conditions hold: 
a)xeintD forall weB, «F2*; 
b) cz) <0 forall ceED, x42"; 
c) DA M* ={a*} and BCD. 


It is not difficult to see that there exists a set D with properties a), b) and 
c). For example such a set can be constructed as follows. Let « € B, « # 
x*. Then c(x) < 0. Since the mapping a is continuous in the Hausdorff 
metric the function c(a) is continuous too. Therefore there exists €, > 0 
such that c(x’) < 0 for all 2’ € V.,(x) 2. Here V.(x) represents the open 
e-neighborhood of the point zx. In this case for the set 
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D=cl LJ Vi. (a) 9 N 


Zee 
ceEBxfx~u* 
Conditions a) to c) are satisfied. 


Lemma 1. For every ¢ > 0 there exists v- > 0 such that 
ula) <u* — ve 
for every x € 2, x ¢ intD and ||x — 2*|| >. 


Proof. Assume to the contrary that for any ¢ > 0 there exists a sequence x, 
such that x, ¢ int D, ||, — 2*|| > € and u(a,) > u* as k — oo. Since the 
sequence x, is bounded it has a limit point, say 2’. Clearly 2’ 4 «*, x’ ¢ int D 
and also u(x’) = u*, which implies 2’ € B. This contradicts Condition a) of 
the set D. 


Lemma 2. For every ¢ > 0 there exists n- > 0 such that 


c(x) < —ne and for alla e D, ||x—2*|| >. 


Proof. Assume to the contrary that for any ¢ > 0 there exists a sequence xz, 
such that x, € D, ||a,—2*|| > € and c(a,) — 0. Let x’ be a limit point of the 
sequence 1. Then x’ € D, «! 4 x* and c(x’) = 0. This contradicts Property 
b) of the set D. 


5.4 Transformation of Condition H3 


In this section we prove an inequality which can be considered as a transfor- 
mation of Condition H3. 

Take any number ¢ > 0 and denote X- = {a : ||x — x*|| > e}. Consider 
the sets DN Xz and M*N Xz. 

Let rE DNX., ye M*N Xz and px = py, with c(x) < 0 and c(y) > 0. 
From Condition H3 it follows that 


y(z,y) <0. (5.4) 
We show that for every ¢ > 0 there exists 6. > 0 such that 
y(z,y) < —d- (5.5) 


for all a € D (a 4 a*), y © M*N Xz, for which px = py, c(x) < 0 
and c(y) > 0. 
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First we consider the case where x € DM X-. In this case if (5.5) is not 
true then there exist sequences (x,,) and (Yn), for which 


pln = PYn, Cian) <0, c(yn) > 0, an EDN Xe, yn € M*N Xe 


and 
By Ey ty Os Cleary) 8. 


From Lemma 17.4.1 it follows that the sequence {(u(an) — u*)/|c(an)|} is 
bounded. Since y(an,Yn) — 0, the sequence {(u(yn) — u*)/c(yn)} is also 
bounded and therefore from Lemma 17.3.1 we obtain c(y) > 0. We also 
obtain c(Z) < 0 from the inclusion  € DN X,-. Thus the function y(a, y) 
is continuous at the point (%,y). Then from (2p, Yn) — 0 it follows that 
y(Z,¥) = 0, which contradicts (5.4). 

We now consider the case where x € D, x 4 2*. Assume that (5.5) 
does not hold. Then there exist sequences (#,) and (Y»), for which pry, = 
PYn, Clan) <0, C(Yn) >0, tn €D, yn € M*NX~e and wy > FB, Yn > Y, 
p(an; Yn) 2 0. If & 4 2*, we have a contradiction similar to the first case. If 
z = x", taking the inequality y 4 x«* into account we obtain a contradiction 
to the second part of Condition H3. 

Thus we have shown that (5.5) is true. 

Define the function 


u(z)—u* u(y) — u* 
c(y) + 62’ 


(L,Y) 51,62 = (5, > 0,52 > 0). 


Since the support function c(-) is continuous, from Conditions H2 and H3 
it follows that there exists a number b € (0,-+0o) such that 


<b forallaeD, x42". (5.6) 
|c(x)| 
For a given ¢ > 0 we choose a number 7(¢) > 0 such that 
VY, 
Fe ap 5.7 
Me ce ee 


here the number vz is defined by Lemma (17.3.1). Clearly 


u(y) -—u* <-y, for allye M*N X,. (5.8) 
By using 7(<) we divide the set M* 1 X-N{y: c(y) > 0} into two parts: 
* 1 
M= tye MXN ty: ely) > OF: ely) 2 5 7e)}, 


¥s = {ye M"OXeN{y: ely) > 0}: ely) < 510}. 
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Consider the set Y2. Denote 6; = $7(€) and take any number 6; > 0. 


Then 


u(x) — u* S 
———— <b forallreD, * 61 <6 
Ic(a)) +0) = or all x cA#u, 0, < 64 


and 


or ah 
ely) + 62 S e(y) + ba < 5 ye) + 5 le) = VC) 
for all 0 < 62 < 62, y € Yo. Using (5.8) we obtain 


u(y) — u" Me ye 
cy) + = cy) +0 = Ye) 


Thus from (5.9) and (5.10) we have 


P(X, Y)51,6. <b- ee 


v(é) 
for all (x,y) and (61, 62) satisfying 
veED, rc #2, yE Yo, c(x) <0, cly) >0, 
0< 6, <&, 0< b2 <b = 5 116). 


Now consider the set Yi. Since Y; is a bounded closed set and c(y) > 


for all y € Y9. 


(5.9) 


(5.10) 


(5.11) 


+(e) > 0 for all y € Yj, then the function (u(y) — u*)/(e(y) + 62) is 
uniformly continuous with respect to (y,62) on the closed set Y; x [0, L, 


where L > 0. That is why for a given number 6, there exists 54(e) such 


that 


uly)—u* uly) —u 
e(y) + 52 c(y) 


On the other hand 


u(x) —u* — u(x) — 


le(e)| +o = le) 


Then if px = py we have 


1 1 
P(t, Y)s162 S P(t, y)+ 5 oe S—be + 5 oe = — 5 be, 
for alla €D, «4 a*, u(x) > u*, ye VY, px = py, 


0 <6, <d,and0 < & < dd (e). 


‘L ra 
< 5 Oe for all y € Yi, 0 < 5g < d3(€). 


for alla € D, a # x*, u(x) > u*, 0 < 6) < dy. 
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Now consider the case when x € D and u(x) < u*. Since the function c(y) 


is bounded on the set Yj, then for ¢ > 0 there exist 03(e) > 0 and 6: > 0 
such that 


u(y) —u ls 7) 
<< ~~ 6, for all Yi, 0 < d29 < d5(e). 
Ca ie aa ae 2 S O2(€) 
Then 
u(y) — u* 1 « 
< <——6,, 5.13 
P(2, Y) 51,52 = c(y) Ly 8 ( ) 


for all a € D, u(x) < u*, ye Yi, px = py, c(x) <0, 
0 <5, <5, and 0 < & < d2(c). 


Therefore from (5.11) to (5.13) we have 


VEU) sudo Ses (5.14) 
for alla eD, x1 4a*, ye M*N Xz, px = py, c(y) > 9, 


0 < Oy < 61 and 0 < b2 < d2(€). 
Here 6- = min{}6-, 45.} and 6:(e) = min{} +(e), d4(c), d3(2)}. 


Consider the function (u(x) — u*)/(\c(x)| + 61) with respect to (a, 61). 
From Lemma 17.4.1 we obtain that this function is continuous on the set 
(DN Xz) x [0,6;] for all € > 0. Then for a given number 4 4. > 0 there exists 
n = n(e,é) > 0 such that 


u(~)-—u* —u(a’)—u* 21 


<6, 
le(x)| +61 — |e(a’)| +61 ~ 2 


for all x € cl (V,(z')) = {z: ||[x —2'|| <n}, 2’ © DN X, and 0 < &; < by. 
If pa’ = py then we have 


y(z, 5 Gss s.> g(a’, Y) 61,6 + 


for all: 2 € cl (V,(a’)), 2° € DNA y EM AXE, pa!’ = py, cy) > 0, 0°< 
01 < 1 and 0 < b9 < do(é). 

Denote 6’(¢) = min{4 6-,52(e)}. Obviously 6’(e) > 0 if e > 0 and also for 
every <’ > 0 there exists 6’ > 0 such that 6’(e) > 6’ for all ¢ > e’. Therefore 
there exists a continuous function 6”(€), with respect to e, such that 


e 6"(e) < d(e) for all ¢ > 0 and 
e for every <” > 0 there exists 6’ > 0 such that 6”(¢) < 6” for alle > &”. 
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Let 2’ € D and y € M*. Taking é = ||" — 2*|| and e = |ly — 2*||, we 
define the functions 6(y) = 6”(e) = 6”(||y — a*||) and n(a’,y) = n(e,é) = 
n(\ly — a*||, ||]2 — 2*||). Clearly in this case the function 6(y) is continuous 
with respect to y. 

Thus the following lemma is proved. 


Lemma 3. Assume that at the point x € D,y € M* we have pa’ = 
py, C(x) < 0 and c(y) > 0. Then for every point x and numbers 61, 62 
satisfying 


xeéecl (Vite y)(2’)) 4 c(x) < 0, 0 < 61 < by and 0 < b2 < d2(y), 


the following inequality holds: 


FGI fal — aad (ara a ae =) ne 
Here the functions n(x',y) and 6(y) are such that d(y) is continuous and 
for every ¢ > 0, €>0 there exist 5 0 and i O such that 
5(y) 28< and (x,y) >, x 


for all (2',y) for which |x — 2*|| >%, lly —2*|| > «. 


5.5 Sets of 1st and 2nd type: Some integral inequalities 


In this section, for a given trajectory x(t) € Xp we divide the interval [0, T] 
into two types of intervals and prove some integral inequalities. We assume 
that x(t) is a given continuously differentiable trajectory. 


3.5.1 


Consider a set {t € [0,7] : x(t) € int D}. This set is an open set and therefore 
it can be presented as a union of a countable (or finite) number of open 
intervals 7 = (tk, t%), k = 1,2,3,..., where 7 7) = 0 if k 4 1. We denote 
the set of intervals tT, by G = {T, : k = 1,2,...}. 

From the definition of the set D we have 


< pn(t) = pa (t) < e(x(t)) <0 for all t € T, k = 1,2,.... 
Then px(t{) > pa(tk) for all k. 

We introduce the notation pi, = px(t*), i= 1,2,..., Pp, = [p?,,pr,] and 
Poy = (Pigs Pry): 
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5.0.2 


We divide the set G into the sets gm (G = Umgm), such that for every set 
J = Ym the following conditions hold: 


a) The set g consists of a countable (or finite) number of intervals 7, for 
which 


i - related intervals P? are disjoint; 

ii - if [t7,,t7,] < [¢,,t7,] for some 7;,7) € g, then P;, > P,,; (here and 
henceforth the notation [t7,,¢7,] < [t7,,t7,] and P,, > P,, means that 
t?, < tt, and pz, > p;,, respectively) 

iii- if [¢},t7] C [t),t2] for 7 € G, then 7 € g; here t) = inf;egt} and 


2. 2 
ty = SUP req fF. 


b) The set g is a maximal set satisfying Condition a); that is, the set g cannot 
be extended by taking other elements 7 € G such that a) holds. 


Sets gm(G = Umgm) satisfying these conditions exist. For example we can 
construct a set g satisfying Conditions a) and b) in the following way. 

Take any interval rT, = (t},t4). Denote by t2 a middle point of the interval 
[0, tt] : te = $t}. If for all intervals + € G, for which 7 C [t2, t{], Conditions 
i and ii hold, then we take t3 = 4 t2, otherwise we take t3 = 4 (t2 + t{). We 
repeat this process and obtain a convergent sequence t,. Let t, — t’. In this 
case for all intervals tT € G, for which 7 C [t’,t}], Conditions i and ii are 
satisfied. 

Similarly, in the interval [t}, 7] we find the point t’ such that for all inter- 
vals t € G, for which 7 C [t4, t’’], Conditions i and ii are satisfied. Therefore 
we obtain that for the set g which consists of all intervals 7 € G, for which 
7 C [t',t”], Condition a) holds. It is not difficult to see that for the set g 
Condition b) also holds. 

Thus we have constructed a set gi = g satisfying Conditions a) and b). 
We can construct another set go taking G \ g: in the same manner and so on. 

Therefore G = Umm, where for all g,;, Conditions a) and b) hold. For every 
set g, we have intervals [t), t7] (see iii) and [pj,p2], where pj = sup,c, p+ and 
Dv, = infeg pe 


Definition 5. We say that gi < ge if: 


a) pe, <p, and. to 
b) there is no g € G which belongs to the interval [t7, ,t),}. 
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Take some set g! € G and consider all sets gm € G, m = £1,+2,+3.,... for 
which 


aga SG SG ay Oe Sack 


The number of sets gm may be at most countable. Denote G, = {g'} U{gm : 
m = +1,+2,+3,...}. 
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We take another set g? € G \ G, and construct a set Go similar to G,, and 
so on. We continue this procedure and obtain sets G;. The number of these 
sets is either finite or countable. Clearly G = U;Q;. 

Denote tg, = infgeg, t and t, = SUDgeg, be Clearly 


Uiltg,,t¢,] ¢ [0,7] (5.15) 
and (tg,,tG,) (tg, .tg,) = 0 for all i 4 j. 
Proposition 1. Let g, € Gi, k =1,2,.... Then 


a) ifg <92<93<.... then a(t, ) =%"; 
b) ifg1 > 92 >93 >... then a(tg,) =. 


Proof. Consider case a). Take some g,. By Definition 5 it is clear that there 
exists an interval 7 € g, and a point ty, € T, such that 


a(t) € D. (5.16) 


Consider the interval [t7,,tj,,,]. Since gx < gk41 by Definition 5 we have 


px(t?,) < px(tj,,,). Therefore there exists a point s, € (t7,,t9,,,) such that 


px (sz) > 0, which implies 


(sk) € M*. (5.17) 


On the other hand, |t7, — tj,| — 0 and |t,,, — t7,| = 0, as k — oo. 


Then x(t) +2 as t — tZ.. In this case x(t.) +x and x(s,) +x as k — oo. 

Therefore from the definition of the set D and from (5.16) and (5.17) we 

obtain re DN M* = {x*}; that is, z= 2*. 
We can prove case b) in the same manner. 


5.0.3 


Take the set G;. We denote by t} an exact upper bound of the points t2  sat- 
isting tn < tG, and by t? an exact lower bound of the points tg. satisfying 
ln Z tG,- 

Proposition 2. There exist points t, € [ti,tg,.| and t/ € [tG,,t7] such that 
a(t;) = 2(t/) = 2". 


Proof. First we consider the interval {t},t%,]. Two cases should be studied. 


1. Assume that the exact upper bound ¢} is not reached. In this case there 
exists a sequence of intervals [¢§ ,tZ,] such that tg, — t} and t§, — 
t;. Since the intervals (t6 ,tG_) are disjoint we obtain that x(t}) = 2*. 
Therefore t) = t}. 
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2. Assume that t} = tZ for any Gm. 


If there exists a sequence gz € Gm, k = 1,2,..., such that gi < go < ..., then 
from Proposition 1 it follows that «(tg ) = 2*. So we can take t; = tg = ti. 

Therefore we can assume that the set G,, consists of gmx, where gm1 > 
Gm2 > ve 

Now consider the set G;. If in this set there exists a sequence gj; such that 
gi. > gig > +++ then from Proposition 6.1 it follows that a(tg, ) = x*, so we 
can take t, = tG,. That is why we consider the case when the set G; consists 
of giz, where gi1 < gig <...- 

Consider the elements gj; and gj. We denote gm = gm and g; = gi1- 

The elements g,, and g; belong to the different sets G,, and G;, so they 
cannot be compared by Definition 5. Since the second condition of this defini- 
tion holds, the first condition is not satisfied, that is, pg > pg,. This means 
that the interval 7 in g,, and g; forms only one element of type g. This is a 
contradiction. 

Therefore we can take either t; = t} or t, = tG,. 

Similarly we can prove the proposition for the interval [t@, , t7]. 


Note We take t; = 0 (or t/ = T) if for the chosen set G; there does not exist 
Gm such that tZ <t§, (or tg > tZ,, respectively). 


Therefore the following lemma is proved. 


Lemma 4. The interval [0,T] can be divided into an at most countable num- 
ber of intervals [0,t*], [t,,t7] and [t?,T], such that the interiors of these in- 
tervals are disjoint and 
a) [0,7] = [0, ¢”] U {Ux [th ti} U Rages 
b) in each interval [0,t"], [t;,t7] and [t?,T] there is only one set Go, Gr 
and Gr, respectively, and 


G = Go U{UKGe} U Gr; 


OS a ae) SS) RSI 


5.5.4 


In this subsection we give two lemmas. 


Lemma 5. Assume that the function x(t) is continuously differentiable on 
the interval [t1, t2] and p, < po, where p; = pu(t;), i = 1,2. Then there exists 
an at most countable number of intervals [tk ,t8] C [t1, t2] such that 
a) pathy sO, 2e re hea 1s 
b) [pi p§] C [pi, p2] and pk < pk for all k = 1,2,..., 
where p* = px(t*), i = 1,2; 
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c) the intervals (pi, p§) are disjoint and 


p2— Pi = So (ps — pi). 


k 


Proof: For the sake of definiteness, we assume that px(t) € [p1, po], t € [t1, ta]. 
Otherwise we can consider an interval [t),,¢5] C [t1,t2], for which pax(t) € 
[p1,p2],t € [t4, th] and p; = pa(ti), i = 1,2. 

We set t(qg) = min{t € [ty,to] : pa(t) = q} for all q © [pi,p2] and then 
define a set 

m = {t(q): @ € [p1,p2]}- 

Clearly m C [t1, tg]. Consider a function a(t) = px(t) defined on the set m. 
It is not difficult to see that for every q € [p1,p2] there is only one number 
t(q), for which a(t(q)) = q, and also a (t(q)) > 0, Vq € [p1, pa]. 

We divide the interval [p;,p2] into two parts as follows: 


Pi ={q: a (t(q)) =O} and Pp = {q: a (t(q)) > Of. 


Define the sets 


m(Pi) = {t(q): q © Pi} and m(P2) = {t(q): ¢ € Po}. 


We denote by my a set of points t € [t1,t2] which cannot be seen from 
the left (see [2]). It is known that the set m, can be presented in the form 
MA = Un(Qn, Bn). Then we can write 


[t1, t2] = m(P1) Um(P2) U ma U (Un {Gr}). 


Let g € Po. Since the function x(t) is continuously differentiable, there 
exists a number ¢€ > 0, such that V-(q) C P2, where V-(q) stands for the open 
e-neighborhood of the point g. Therefore the set m(P2) is an open set and 
that is why it can be presented as m(P2) = Uz(t*, tk). 

Thus we have 

te 


p2—pi = px(te) — px(ti) = Je x (t)dt = / pa (t)dt + 
ty m(P1) 


y [os (t)dt + fos (t)dt + ; / px (t)dt. 
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It is not difficult to observe that p x (t) = a (t) = 0, Vt € m(P1), 
meas(Un{Gn}) = 0 and pa(an) = px(Bn), n = 1,2,... (see [2]). Then we 


obtain 


Pepi. = S "(px (t§) — =So(p5 
k 


k 


Therefore for the intervals [t?, 4] all assertions of the lemma hold. 


Lemma 6. Assume that on the intervals [t,, tz] and [s2, 81] the following con- 
ditions hold: 


1. px(t;) = px(s;) = Dis St LD. 

2. x(t) € intD, Vt © (ti,te). In particular, from this condition it follows 
that px (t) <0, Vt € (t1,t). 

3.px(s)>0, VWs € (89, 81). 


Then 
u(e()dt + f u(n(s))ds < ult ti) + (s1 — $2)] J P(e(s))as 


where the function 6(x) is as defined in Lemma 17.6.1. 
Proof: Consider two cases. 


I. Let p* =) pa* #£p,, t= 1,2. We recall that p; = px(t;), = 1,2. In this 
case from Conditions 2) and 3) we have 


€= p(a*,{ax(t): t € [t1,te]}) >0 and ¢=p(a*,{x(s): s € [s2,51:]}) > 0. 


A 
Now we use Lemma 17.5.1. We define 6 =§,> 0 and =1 z for the chosen 


numbers € and ¢. We take any number N > 0 and divide the interval (po, Pr] 
into N equal parts [p§, p¥]. From Conditions 2 and 3 it follows that in this 
case the intervals [t,, t2] and [s2, s;] are also divided into N parts, say [t?, tk] 
and [sk, s*], respectively. Here px(t*) = px(s*) = p¥, i= 1,2, k =1,...,N. 
Clearly 


P1 — P2 
N 


Since x(t) € D and ||a(t) — 2*|| > € > 0 for all ¢ € [t,t] then from 
Lemma 17.4.1 it follows that 


k 


Pi pk = 0 as N— oo. 


px (t) < c(#(t)) < -ny < 0. 


That is why for every k we have tk — t¥ — 0 as N — oo. Therefore for a 
given 7 > 0 there exists a number N such that 
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max eee) se for all k= 1,...,.N. (5.18) 


t,s€[t® ,t& 
Now we show that for all k = 1,...,N 
si—s +0 as Noo. (5.19) 


Suppose that (5.19) is not true. In this case there exists a sequence of inter- 
vals [sh", sf], such that sf" — si, i= 1,2, and sh < 54. Since p{® —p® 
as N — oo then px(s) = px(s{) = p’, and moreover pa(s) = p’ for all 


s € [s5, 5]. This is a contradiction. So (5.19) is true. 


— 0 


A. Now we take any number / and fix it. For the sake of simplicity we 
denote the intervals [t?, tf] and [s%, si] by [t1, ta] and [sz, 1], respectively. Let 
p = px(t;) = px(s;), i= 1,2. 

Take any s € (s2,8,) and denote by t’ the point in the interval (¢,, t2) for 
which px(t’) = px(s). From (6.8) it follows that 


x(t) € V,(x(t’)) for all t € [t1, ta]. 


Therefore we can apply Lemma 17.5.1. We also note that the following 
conditions hold: 


e |cx(t))| < |p z (t)| for all t € (t1, ta); 
e cx(s))>px(s) forall s € (s2,51); 
e u(a(s)) <u* for all s € [s2, 54]. 
Then from Lemma 17.5.1 we obtain 
we()) ull) 2 ( 2 eee 
Ipx (t)|+o. pa (s) +42 Ipx(t)| +61 pax (s) +62 
— 5(a(s)), (5.20) 


for allt € (ti, ta), SE (S2, 81), 01 E (0, 6] and 69 € (0, 6(a(s))]. 
Denote € = mingejs,,s,] 6(x(s)). Clearly € > 0. Since the function 6(zx) is 


continuous there is a point SE [s2, $1] such that 


B= Dae ae t), t€ [t,t], 


w= prx(s)+é(s—s1), $8 € [S2, 81]. 
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Clearly 


dx = [p x (t)— €]dt and dw = [p x (s) + €]ds, 


where €= €(S ame 81) /(ty = ta). 
Since p x (t)— €< 0 and pz (s) +€ > 0 then there exist inverse functions 


t = t(m) and s = s(w). We also note that 7 = px(t,) = px(s,) S w, and 


A A 
T = px(tz) + €(s2 — 81) = we. 
Therefore we have 


4S Lae / iets 
7 cco ecco, 
pa (t(m)—& J pt (s(w)) +€ 


- f( w(a(t(w))) _, _wle(s(w))) 
pe (tw)|+E Pe(sw)) +é 


We 


SS + 
Q 
© 


Let 6; SE . Since 
€ < d(a(t)) = d(a(t(w))), tw) € [ti, te], s(w) € [s2, 51], 


then from (6.10) we obtain 


Wy 


i 1 1 
us | dw 6(ax(s(w)))dw 
( —] ecco) 


pa (t(w))|+€ Pe (sw y 


S82 


aa (fare fa) - f aeons (s) + €Jds 


< u*[(te — t1) + (51 — 82)] J 8(e(ous. 


82 


On the other hand 6(a(s)) > € = 6(x(s)). Thus 


A < u*|(te ty) t (sy $2)] (sy 82)67(a(8)). 


B. Now we consider different numbers k. The last inequality shows that 
for every k = 1,..., N there is a point s,€ [s&, s*] such that 
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i u(x(t))dt-+ / u(x(s))ds < u*[(ef — #8) + (of — sk] — (oh — sh) 6?(0(S%)). 


Summing over k we obtain 


to S1 N 
i u(x(t))dt-+ / u(w(s))ds < u*[(t2—t1) + (61 —2)] — ¥1(s4 — 5h)82(e()). 
ty so k=1 


Therefore the lemma is proved taking into account (5.19) and passing to 
the limit as N — oo. 


II. Now consider the case when p* = p; for some 7 = 1,2. For the sake of 
definiteness we assume that p* = py. 

Take any number a@ > 0 and consider the interval [p2,p1 — a]. Denote by 
[ty —t(a), ta] and [s2, s; — s(@)] the intervals which correspond to the interval 
[p2,p1 — a]. Clearly t(a) — 0 and s(a) > 0 as a — 0. We apply the result 
proved in the first part of the lemma for the interval [p2,pi — a]. Then we 
pass to the limit as a — 0. Thus the lemma is proved. 


5.9.9 


We define two types of sets. 


Definition 6. The set 7 C [0,T] is called a set of 1st type on the interval 
[p2, pi] if the following conditions hold: 


a) The set 7 consists of two sets 71 and 7, that is, m= 7 U7, such that 
a(t) €intD, Vtem, and x(t) €gintD, Vt em. 

b) The set 7, consists of an at most countable number of intervals d;, with 
end-points t? < t* and the intervals (px(t§), pr(t#)), k = 1,2,..., are 
disjoint. 

Clearly in this case the intervals d? = (t/,t&) are also disjoint. 
c) Both the inequalities p; > sup, pr(ti) and po < inf, px(t§) hold. 


Definition 7. The set w C [0,7] is called a set of 2nd type on the interval 
[p2,p1] if the following conditions hold: 
a) a(t) ¢intD, Vtew. 
b) The set w consists of an at most countable number of intervals [s%, s*], 
such that the intervals (px(sS), px(si)), k =1,2,..., are nonempty and 
disjoint, and 


Pi — P2 = )_[pa(st) — px(sh)}. 


k 
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Lemma 7. Assume that 7 and w are sets of 1st and 2nd type on the interval 
[p2, pil, respectively. Then 


/ TCO ea ee rae / [u* — u(a(t))]dt — ' 5(a(t))dt, 
Q E 


TUw 


where 
a) QUE=wUm={temUw: a(t) ¢ intD}; 
b) for every € > 0 there exists a number 6. > 0 such that 


6°(x) > 6- for alla for which ||x — x*|| > ¢; 
c) for every 6 > 0 there exists a number K(d) < oo such that 
meas[(7 Uw) Zs] < K(d)meas|(QU E) NM Zs], 
where Zs = {t € [0,T]: |pa(t) — p*| > o}. 


Proof. Let m = 7m Um, ™ = Unde, Unin Cw and vy, = [s¥,s}] (see 
Definitions 6 and 7). 

We denote 7? = Ud? and d? = (t?,t8). Clearly meas, = meas 7? and 
that is why below we deal with d?. 

Denote p? = px(s), 1 = 1,2. Clearly p} < pf. Since the function x(t) is 
absolutely continuous, from Lemma 5 it follows that there exists an at most 
countable number of intervals [s}’", s?’"] C [s¥,s?], m=1,2,..., such that 

i-px(s)>0, for all s € [s3™, s?™], n,m =1,2....; 

ii - [pp] C [pb, pt] and ph” < p?”™ for all n,m, 

here p?"” = px(s?™), i= 1,2; 
iii - the intervals (p}”, p?’'”), n,m = 1,2,..., are disjoint and 


pt — p3 = >" (pi — pg). 


m 


Therefore the set w contains an at most countable number of intervals 1, = 


(sh', s{”), such that: 
1. the intervals (p}’, pi’), m = 1,2,..., are disjoint (here pi” = px(s%"), 
i=1,2,m=1,2,...); 
2. 
px (€) > 0, for all t € Un(s%", 37"). (5.21) 


Now we take some interval d? = (t*,t%) and let p* = pa(t®), i = 1,2. Denote 
k Lee <2 a a 


[pepe = [pep tlps Bile (5.22) 
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Since p x (t) < 0, for all ¢ € d?, from (5.21) it follows that there are two 
intervals [t*™,tk™] and [sk™, sk™] corresponding to the nonempty interval 
[ps pi], and 

doa" — i") = th. 
m 


Applying Lemma 6 we obtain 


/ u(x(t))dt+ / u(x(s))ds < urceer —4h™) 4 (sk — sky] — | 5?((s))ds. 


Summing over m and then over k we have 


u(a(t))dt + u(x(s))ds 
eee 
<u* | S05 - ef) + SOP - 68™)] — SO f O(a(s))as. 
k k,m KM ckm 


Denote w! = Ux m[s&™, s?™]. Clearly w’ C w. Therefore 


II 
i 
Y diet 
8 
om 
a 
—_ 
Q 
+ 
| 


7 / neat 2 i) u(a(t))db + i aele\ae 


w\w! 72 


< u*(meas7, + meas!) ~ | 6%(e(s))ds 


- / [u* — u(a(t))|dt + u*[meas 72 + meas (w \ w’)] 


T2U(w\w’) 


==) incase Ouy = i, [u* — u(a(t))Jdt — if 5?(a(t))dt, 
Q E 


where Q = m2 U(w\w’) and E=u!. 

Now we check Conditions a), b) and c) of the lemma. 

Condition a) holds, because QU E = 72U (w\w’) Uw! = m2 Uw. Condition 
b) follows from Lemma 17.5.1. We now check Condition c). 

Take any number 6 > 0 and denote Ps = {1: |l — p*| > 6}. 
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Consider the intervals [p$™, p{"], [p3,p{] and [p3', pi"] (see (6.11)) cor- 
responding to the interval d?, where 


pk — pk = 5 (pi — ph”). (5.23) 


m 


From Lemma 17.4.1 we have 


px (t) < c(x(t)) < —ms for allt € mM Zs. (5.24) 
On the other hand there exists a number K < +00, for which 
p(t) <K for all t € [0,7]. 
Therefore 
meas ([p5, pt] Ps) = ip [—p x (t)]dt > ns meas (dy M Zs). 


dpNZs5 


Summing over k we have 


T= S "meas ([p5, pt] APs) < ns meas (7 9 Zs). (5.25) 
k 


On the other hand, from (5.23) it follows that 


<— S| meas (ps, vt") NPs) = De: / p «x (t)dt 
k,m 


ReMi gkm skmIn Zs 


< K 5 ~ meas ([s$, s}"] 9 Zs) < K meas (w’N Zs). (5.26) 
kym 
Thus from from (5.25) and (5.26) we obtain 
K Kk 
meas (71M Zs) < a meas (EM Zs) < 7; meas [(Q U E) N Zs]. 
5 5 


But QU E = m2 Uw and therefore 
meas [(7 Uw) 9 Zs] = meas (71 M Zs) + meas [(QU E) NO Zs] 
K 
< — meas [(QU FE) N Zs] + meas [((QU E) N Zs]. 
16 


Then Condition c) holds if we take Ks = * + 1 and thus the lemma is 
proved. 
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5.6 Transformation of the functional (5.2) 


In this section we divide the sets Go, G, and Gr (see Lemma 4) into sets 
of Ist and 2nd type such that Lemma 7 can be applied. Note that x(t) is a 
continuously differentiable trajectory. 


5.6.1 


Lemma 8. Assume that the set G; consists of a finite number of elements 
Gky GQi<goa<..< gn. Then 


‘| Oe | mt cars f (a(t))dt 


Here ty and wy are the sets of 1st and 2nd type in the interval [pk , p*| and 
the set F is either a set of 1st type in the interval Wow Pol if Pes S Does OF 
is a set of 2nd type in the interval er Bowls sae > ae 


Proof. Take the set g; and assume that [p2, Pol and [Faas bel are the corre- 
sponding intervals. Note that we are using the notation introduced in Section 
5.5. Take the set go. 


A. First we consider the case Por < D4 9, Mm this case there is a point 
i} € [é5,,#2,] such that px(t')) = pj. Denote wy = [t', 12.) and ow = [45,1 
Note that by Definition 6 we have 2, < Poo: It is clear that 7, and w , are 


sets of Ist and 2nd type on the infer [Poy Pools respectively. Therefore 


facta / u(x(t))dt + i u(x(t))dt, 


where 7} = [t),,¢'] U [t),,t2,] is a set of Ist type on the interval [p%,, p;, |. 


B. Now we assume that p;, > p4,- In this case there is a point BS lata 
such that pz(t') = pt, . Denote m = [tl ,t?,] and w; = [t?,,t']. Consider two 


gi? “91 gi? 
Cases. 


1. Let p?, > pj, Then there is a point t? € [t',t),] such that pax(t?)) = p%,. 
In this case we denote 


118 M.A. Mamedov 


= Twig CaS le cand) wy = 6.6 |. 
Therefore 

tes 

/ u(a(t))dt = So / (oleae / wilali)) de, 

‘i, 1=1,29 Uw, wt 


where w is 2 set of 2nd type on the interval [p} ae De ls 
2. Let p3, < pj,- Then there is a point t? € (teh, | auch that px(t?)) = pj. 
In this case we denote 


my = [t5,,¢°], wo = [e', £5] and m = [7,22]. 


g2? > "92 
Therefore 
t2 
92 
ic if ‘ites ()ae+ f u(a(t))dt, 
ti 11 er Uw: Th 


where 7 is a set of 1st type on the interval bas Pyle 


We repeat this procedure taking gs, g4,...gn and thus the lemma is proved. 


Lemma is Assume that gn € Gi, n = 1,2,..., 91 < g2 < .., and t?? = 
litt ssh . Then 

t? 

j u(x(t))at = / ueatdes: j uCRONGE 

th 2 tpUwy F 


91 


Here T and Wn are sets of 1st and 2nd type in the interval [p2, pi] and the 
set F is either a set of 1st type in the interval [p*, 5,1, if pe < Diu or is a 
set of 2nd type in the interval [p51 P");, if p* > eee 


Proof. We apply Lemma 8 for every n. From Proposition 1 we obtain that 
a(t) > 2* as t > t?, and therefore pZ, — p* as n — oo. This completes the 
proof. 


We can prove the following lemmas in a similar manner to that used for 
proving Lemmas 8 and 9. 


Lemma 10. Assume that the set G; consists of a finite number of elements 
Gky Qli>g>..> gn. Then 
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Here m, and wx are sets of 1st and 2nd type in the interval [p7,p;| and the 
set F is either a set of 1st type in the interval [p*, pay], if Pyy 2 Pg,» OF 18 
a set of 2nd type in the interval [pj,,,P%,), tf Poy < Por 


Lemma -_ Assume that gn € Gi, n = 1,2,.., g1 > go >», and t! = 


lite, . Then 
; u(a(t))at = i: u(a(t))dt + / eed. 
tl  tnUwn F 


Here T, and Wn, are sets of 1st and 2nd type in the interval [p2,, pi] and the 
set F is either a set of 1st type in the interval [p2,,v"], if p* > D3 or is a 
set of 2nd type in the interval [p*,p2,], if p* < p3,- 


In the next lemma we combine the results obtained by Lemmas 9 and 11. 


Lemma 12. Assume that the set G; consists of elements gn, n = +1, 
+2,..., where +++ < gg < g- 3 < gm < g2 < +++, and where t} = 
limp—.— co fea and ? = lim, 460 t3 . Then 

#2 

J wateyat = 3 [ «@ 

i» ” tnUwn 


Here Tn and Wy are sets of 1st and 2nd type in the interval [p?, p4]. 


Proof. We apply Lemmas 9 and 11 and obtain 


i: u(x(t))at = 7 / ial abs: / u(x(t))dt, (5.27) 
Z 


) u(a(t))dt = S~ / u(a(t))dt + ij u(a(t))dt. (5.28) 


TH ut! BU 

We define m = F’UF” and wo = [t?_,, t), 

2nd type in the interval [p;_,,p),] (note that p?_, <p), by Definition 5). 
Therefore the lemma is proved if we sum (5.27) and (5.28). 


rae ey ney are sets of lst and 


5.6.2 


Now we use Lemma 4. We take any interval [t}, ¢?] and let 
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[te th] = [tk te,] U [tg t6,] U [t,t 


We show that 


jute Me d, ic jdt + / u(x(t))dt, (5.29) 
th tkUwk Ek 


k 
where 7* and w* are sets of Ist and 2nd type in the interval [p?,,p},] and 
x(t) € int D, Vt € E*. 
If the conditions of Lemma 12 hold then (5.29) is true if we take 
B® = (th, tg,] U [t,t 


Otherwise we apply Lemmas 8-11 and obtain 


ieee ))dt = d, ic (dt + f u(a(e)yat 
tg, ” kw Fk 


If F* is a set of 2nd type then (5.29) is true if we take E* = F*. Assume 
that F* is a set of Ist type on some interval [p?, p']. In this case we set 


ah =F and wh = [hs tg.) U [tg 444: 


We have x(t}) = x(t?) = 2* (see Lemma 4) and therefore 7* and w} are sets 
of 1st and 2nd type in the interval [p?, p']. Thus (5.29) is true. 
Now we apply Lemmas 8~12 to the intervals [0,t'] and [t?, 7]. We have 


ti 


[uot dt = » ic (oats fu (o(pyae+ f u(a(e)yar, (5.30) 


70 Uw? Fo Eo 


T 

/ u(a(t))dt = S~ i u(a(t))dt + . u(a2(t))dt + / u(a(t))dt. (5.31) 
“ ” aTUwt FT ET 

Here 


e F° and F” are sets of Ist type (they may be empty); 
e [0,¢,] U [t,t] C B® and [t?,26,] U[2,,7] C ET; 
e z(t)¢intD forall te E°UE. 
Thus, applying Lemma 4 and taking into account (5.29)—(5.31), we can 
prove the following lemma. 


Lemma 13. The interval [0,T] can be divided into subintervals such that 
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(0, T] = Un(tn Uwn) U Fi U FQ UE, (5.32) 
T 
i u(x(t)) dt = 7 / uae / u(a(t)) dt + i: u(x(t)) dt. 
0 nw F\UF2 E 
(5.33) 
Here 


1. The sets T, and wy are sets of 1st and 2nd type, respectively, in the inter- 
vals [p?, pi], n = 1,2,.... 

2. The sets F, and F» are sets of 1st type in the intervals [p?, pt] and [p3, p4], 
respectively, and 


x(t) € intD, forall te FU Fy, (5.34) 
Pi —pp<C<+00, 1=1,2. (5.35) 

3. Also 
a(t) gintD, forall te E. (5.36) 


4. For every 6 > 0 there is a number C(d) such that 
meas [(F) U Fh) N Zs] < C(d), (5.37) 


where the number C(d) < co does not depend on the trajectory x(t), on T 
or on the intervals in (5.32). 


Proof: We define F; = {t € F°: a(t) € intD}, Fh ={te F?: x(t) €intD} 
and E = (U;,E*) U E°U E7. Then we replace r? to r? U (F° \ Fi) and x? 
to mf U(F? \ F2) in (5.30) and (5.31) (note that after these replacements 
the conditions of Definition 6 still hold). We obtain (5.33) summing (5.29)— 
(5.31). It is not difficult to see that all assertions of the lemma hold. Note that 
(5.35) follows from the fact that the trajectory x(t) is uniformly bounded (see 
(5.3)). The inequality (5.37) follows from Lemma 17.4.1, taking into account 
Definition 6, and thus the lemma is proved. 


Lemma 14. There is a number L < +00 such that 
[u(a(t)) — u*] dt < L, (5.38) 
F\UF> 


where L does not depend on the trajectory x(t), on T or on the intervals in 
(5.82). 


Proof: From Condition H3 it follows that there exist a number ¢ > 0 and a 
trajectory x (-) to the system (6.1), defined on [0,T-], such that p x (0) = 
p*—e, p(T.) =p* +e and 
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p x (t) > 0 for almost all ¢ € [0, T.]. (5.39) 


Define 


[0,7] 


Consider the set F, and corresponding interval [p?, pt]. Define a set 


Fy ={te F': |pa(t) — p*| < e}. 


We consider the most common case, when [p* — €,p* +] C [p?, pt]. In 
this case the sets Ff and [0,7T-] are sets of Ist and 2nd type in the interval 


[p* — e, p* + €] for the trajectories x(-) and Z (-), respectively. We have 


/ anaes = f ula de + ih ial) ae? i, ulate) de: 
Fi Fe [0,7] F\\FF 
(5.40) 


We use Lemma 7. Note that this lemma can be applied for the trajectory 2 (-) 
(which may not be continuously differentiable) due to the inequality (5.39). 
Taking into account d(x) > 0 and u(a(t)) < u*, for t € Q, from Lemma 7 we 
obtain 


fut) dt + / u(a(t)) dt < u*(meas Fy + T-). (5.41) 
Fr [0,7] 
From (5.37) it follows that 
meas (F| \ FY) = meas(FiNZ-) < Cle). 
Thus 


u(a(t)) dt < Ce, (5.42) 
F\\ Fe 


where the number C, < +00 does not depend on T or on the trajectory x(t). 
Denote C’ = T, u* + Cz — R;. Then from (5.40)—(5.42) we obtain 


[uct dt < u*meas FPF} +C’ < u* meas F, + C" 
Fy 
and therefore 


[euew) —u*|dt < C’. 


Fy 
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By analogy we can prove that 


few) —u*)dt < Cc". 


PF 


Thus the lemma is proved if we take L = C’ + C”. 


5.7 The proof of Theorem 13.6 
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From Condition M it follows that for every T > 0 there exists a trajectory 


xr(-) € Xr, for which 


) u(ap(t)) dt > u®* Tb. 


(0,7) 


5.7.1 


(5.43) 


First we consider the case when x(t) is a continuously differentiable function. 


In this case we can apply Lemma 7. 
From Lemmas 4 and 14 we have 


J wewyae < S- p u(e(®) at f ula(t)) dt+b-but +meas (FUP) 


[0,7] TnUwn E 
Then applying Lemma 7 we obtain 
ie u(a(t)) dt 


[0,7] 
< SO [ u* meas (1, Uwp) — [ we)at- fer ouya 
” Qn En 
i [uate Tans Ewes (OR) 


E 


= u* (= meas (77, Uw) + meas (Fi U F2) + meas r) 


n 


2 j [u* — u(a(t))] dt — / 5°(a(t)) dt + L 
Q 


A 


= u* meas [0,7] — [te — u(z(t))] at — f 6%(e() dt + L. 
A 


Q 
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Here Q = (UnQn)UE and A=U,E,. Taking (5.43) into account we have 


[ema f worl) dts - flu —ua(e))at 


[0,7] [0,7] 


Q 

2 [reo dt +L +b, 
A 

that is, 


Jp(z(-)) — Jr(er(-)) < — fe — u(x(t))| at— f 5%(e() dt + L+d, 
Q ‘A 


(5.44) 

Here 
Q = (UnQn) UE and A = Un En (5.45) 

and the following conditions hold: 

a) 
QUA = {teE[0,T]: x(t) €intD}; (5.46) 

b) 
(0, 7] = Un (tm Uw) U (Fi U Fo) U EB; (5.47) 


c) for every 6 > 0 there exist K(d) < +00 and C(d) < +00 such that 


meas [(7 Uw) 9 Zs] 
meas [(F) U Fh) N Z5| 


(6) meas [(Qn U En) AN Zs] and (5.48) 


< 
< C(5); (5.49) 


(recalling that Zs = {t € [0,T]: |px(t) — p*| > 5}) 
d) for every < > 0 there exists J. > 0 such that 


6*(x) > 6, for all x, ||z—2x*|| > e. (5.50) 


The first assertion of the theorem follows from (5.44), (5.46) and (5.50) for 
the case under consideration (that is, x(t) continuously differentiable). We 
now prove the second assertion. 

Let ¢ > 0 and 6 > 0 be given numbers and 2(-) a continuously differentiable 
€-optimal trajectory. We denote 


X- = {t€ [0,7]: ||x(t) -— 2*|| > ¢}. 
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First we show that there exists a number /’ e,€ < +00 which does not depend 
on T’ > 0 and 


meas [(QU A)NX.] < Kee - (5.51) 
Assume that (5.51) is not true. In this case there exist sequences T, — oo, 
Loe — oo and sequences of trajectories {a*(-)} (every «x*(-) is a €-optimal 
trajectory in the interval [0,7;,]) and {x7,(-)} (satisfying (5.43) for every 
T = T;) such that 
meas [(Q* U A*) X¥] > KE, as ko. (5.52) 
From Lemma 17.3.1 and (5.50) we have 


u* —u(2*(t)) 
5°(a*(t)) 


ids ab ae UX aid 
Sue. it FEA xs: 
Denote v = min {v., 62} > 0. From (5.44) it follows that 
In, (x*(-)) — In, (an,(-)) < L+b-vmeas [(Q* U A*) NX). 
Therefore, for sufficiently large numbers k, we have 
Jn, (a) S Jn,(an()) -2€ S Ip, — 26, 
which means that x*(t) is not a €-optimal trajectory. This is a contradiction. 


Thus (5.51) is true. 
Now we show that for every 6 > 0 there is a number Ky, Be < +oo such that 


meas Zs < Ke. (5.53) 
From (5.47)—(5.49) we have 
meas Z5 = xe meas [(7, U wr) M Zs] + meas [(F) U Fo) N Zs] 


+ meas (EM Zs) 
< = K(6) meas [(Q, U E,) M Zs] + C(5) + meas (EM Zs) 


IA 


(6) meas [([Un(Qn U En)] A Zs) U (EM Z5)| + C(6) 
(6) meas [(QU A) /N Zs] + C(6), 


2 2 


I 


where K (5) = max{1, K(6)}. 
Since Zs C X5, taking (5.51) into account we obtain (5.53), where 
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Khe =K (8) Koe +C(6). 


We denote X2). = {t € [0,7]: ||x(t) — x*|| > e/2.}. Clearly X2/, is an 
open set and therefore can be presented as a union of an at most countable 
number of open intervals, say Xero = Uz Tr . Out of these intervals we 
choose further intervals, which have a nonempty intersection with X.-, say 
these are T,%,k = 1,2,.... Then we have 


Roe: pte CO Kee. (5.54) 


Since a derivative of the function x(t) is bounded, it is not difficult to see 
that there exists a number o, > 0 such that 


meast, > o- forall k. (5.55) 
But the interval [0,7] is bounded and therefore the number of intervals 


T, is finite too. Let k = 1,2,3,...,Nr(e). We divide every interval 7; into 
two parts: 


tT = {tem: x(t) €intD} andr? ={ter,: x(t) ¢ int DP}. 
From (5.46) and (5.54) we obtain 
UnTe C (QUA)N X29 
and therefore from (5.51) it follows that 
meas (Up 72) < Ke/2¢ - (5.56) 
Now we apply Lemma 17.4.1. We have 
pa(t) < — mea, te UnTp- (5.57) 
Define p;, = sup;c,, pr(t) and pz; = infye,, px(t). It is clear that 
ph—-pe < ©, k=1,2,3,...,Nr(e), (5.58) 
and 


Ip x (t)| < K, for allt. (5.59) 


Here the numbers C and K do not depend on T > 0, x(-), € or €. We divide 
the interval 7, into three parts: 


Tj ={t€Tm: px(t)<0}, 72={ter,: px (t)=0} and 
a ={termR: pa (t)> 0h: 
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Then we have 


We define a=— f[ px (t)dt and B= f[ px (t)dt. Clearlya>0,8>0 
+ 


Th Tr 


and 


1 2 —at 8, if a< QB, 
Py- PR > { a—B if a>B. (5.60) 


From (5.59) we obtain 
0 < B < Kmeasr;. (5.61) 
On the other hand, 7 C 7, and therefore from (5.57) we have 
a > mesgmeast, > Me/2meas Tp. (5.62) 


Consider two cases. 
a) a> @. Then from (5.60)—(5.62) we obtain 


C > p—-p > a-pe Ne/2 Meas Tj, — Kmeas7,’. (5.63) 


Since 7° C 77, then from (5.56) it follows that meast < Rajpré : 
Therefore from (5.63) we have 


measT, < Onn where Oe: =(C+K. Kejae)/Me/2 (5.64) 
b) a < @. Then from (5.61) and (5.62) we obtain 
Ne/2measT, < Kmeastj < K- Kejeg 
or 
measT, < Ci, where Cl, = K- Ke/2¢ /Ne/2+ (5.65) 
Thus from (5.64) and (5.65) we obtain 
measT, < Ceg = max{C,¢,Coe}, K=1,2,..,Nr), 
and then 


meas (Ux 7) < Nr(e) Cee. (5.66) 
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Now we show that for every « > 0 and € > O there exists a number 
K? ¢ < +00 such that 
meas (Uz 7%) < Khe. (5.67) 


Assume that (5.67) is not true. Then from (5.66) it follows that Nr(¢) — co 
as T — oo. Consider the intervals tT, for which the following conditions hold: 


1 
measT, > 5 7 and measT? < \meas7;, (5.68) 


where 2 is any fixed number. Since Nr(e) — oo, then from (5.55) and (5.56) 
it follows that the number of intervals 7; satisfying (5.68) increases infinitely 
as T’ > oo. 
On the other hand, the number of intervals 7, for which the conditions 
a < 2B, 
meas 7, > Ameas 7, and A= Ne/2/K 


hold, is finite. Therefore the number of of intervals 7, for which the conditions 
a < 6 and (5.68) hold infinitely increases as T — oo. We denote the number 
of such intervals by Nr and for the sake of definiteness assume that these are 
intervals 7, k= 1,2,...,Nr. 

We set A = e/2/2K for every Ty. Then from (5.63) and (5.68) we have 


1 
Dy — Py > NesgmeasT, — Roe measT, = = MNe/2 meas T;,. 
2K 2 
Taking (5.55) into account we obtain 
Pe—-De = ee; k=1,2,...,Nr, (5.69) 
where 
1 
Ce = ZNe/20e > O and Nr -~ owoasT—o. 


2 
Let 5 = } ec. From (5.69) it follows that for every 7; there exists an interval 


d, 2 [sz, 82] C Tr such that 


|pa(t) —p*|>6, tedx,  p2x(s,) = sup pa(t), 
ted, 


pa(sz) = inf pa(t) and pa(sg) — p2x(sz) = 6. 


ted, 
From (5.59) we have 
jo= p x (t) dt| < i |p v(pjats f |p x (t)| dt < K-measdg. 
[sh 82 [sp.82 dx 
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Then meas d, > 6/K > 0. Clearly dy C Zs and therefore 


Nr 
6 
meas Zs > meas (Ones dy = S "meas dy > Nr K 
k=1 


This means that meas Z; — oo as T — oo, which contradicts (5.53). 
Thus (5.67) is true. Then taking (5.56) into account we obtain 


meas Up Tr = S “(meas 74 + meas 7p) < Ree meee 
k 


Therefore from (5.54) it follows that 
meas X,_ = meas Ug Te < Kee, 


where Ke,¢ =Keyre +KZ ¢. 
Thus we have proved that the second assertion of the theorem is true for 
the case when 2(¢) is a continuously differentiable function. 


5.7.2 


We now take any trajectory x(-) to the system (6.1). It is known (see, for 
example, [3]) that for a given number 6 > 0 (we take 6 < ¢/2) there exists a 


continuously differentiable trajectory & (-) to the system (6.1) such that 
|| x(t)— x (t)|| < 6 for all t € [0,7]. 


Since the function u is continuous then there exists 7(6) > 0 such that 


~N 


u(x (t)) > u(a(t)) — 7(6) for all ¢ € [0,7]. 


Therefore 
fe (t)) dt > J ew) a —T00). 
{0,T] [0,7] 


Let € > 0 be a given number. For every T’ > 0 we choose a number 6 such 
that T (0) < €. Then 


i na) ae-< i; ue ab THC i u(z (t))dt-+é, (5.70) 


[0,7] [0,7] [0,7] 
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that is, 
fiuaw-wla sf tue @)-whare 
[0,7] (0,T] 
Since the function x (-) is continuously differentiable then the second integral 


in this inequality is bounded (see the first part of the proof), and therefore 
the first assertion of the theorem is proved. 


Now we prove the second assertion of Theorem 13.6. We will use (5.70). 
Take a number ¢ > 0 and assume that «(-) is a €-optimal trajectory, that is, 


Jp(x(-)) = Ip — €. 


From (5.70) we have 


Ip(@(-)) = Ir(x(-)) -—€ > Ip — 2. 


Thus x (-) is a continuously differentiable 2€-optimal trajectory. That is why 


(see the first part of the proof) for the numbers ¢/2 > 0 and 2€ > 0 there 
exists K.,¢ < +oo such that 


meas {t € [0,7]: || z(t) —2* || > ¢/2} < Keg. 
If || a(t’) — a* || > € for any ¢’ then 


| =e) —2* || > la’) — 2" || - Jal) 2 (#) || De-6> 


Therefore 


{t€ [0,7]: ||2@)—2*||>e} c {te [0,7]: ||  @)—2*|| > e/2}, 
which implies that the proof of the second assertion of the theorem is com- 
pleted, that is, 
meds {t< (0,7) 2 ||et) =e" || ep non 


Now we prove the third assertion of the theorem. 
Let «(-) be an optimal trajectory and x(t1) = x(t2) = «*. Consider a 
trajectory «*(-) defined by the formula 


* x(t) if t € (0, ti] U [te, T], 
te) { a* if ¢t& [tito]. 
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Assume that the third assertion of the theorem is not true, that is, there 
is a point t’ € (t1,t2) such that ||a(t’) — a*|| =c>0. 

Consider the function «x(-). In [8] it is proved that there is a sequence of 
continuously differentiable trajectories x,(-), ¢ € [t1, 7], which is uniformly 
convergent to x(-) on [t,,7] and for which x,,(t,) = 2(t,) = «*. That is, for 
every 6 > 0 there exists a number Ns; such that 


max ||@,(t)—«(t)|| < 6 for alln > Ns. 
tE[ti,T] 


On the other hand, for every 6 > 0 there exists a number 7(6) > 0 such that 
n(d) +0 as 6 > 0 and 


| u(x(t)) — w(an(t)) | < (6) for all t € [t1, Z]. (5.71) 
Then we have 
/ u(a(t)) dt < i u(an(t)) dt + T (0d). (5.72) 
[t1 7] [t1,T] 


Take a sequence of points t” € (t’,t2) such that t” > t, as n > oo. Clearly 
in this case x,(t") — 2*. We apply Lemma 13 for the interval [t,t”] and 
obtain (see also (5.31)) 


u(a,(t)) dt 
[é1,t”] 


= i u(an(t)) dt + if u(an(t)) dt + if u(an(t)) dt. (5.73) 


B pntyyn En En 


Here x(t) € intD Vt € F” and F” is a set of lst type on the interval 
[ptn(t”), p*] if pa, (t") < p*. 

Since z,(t") — «x*, px,(t”) > p* and thus for every t € F” we have 
u(an(t)) > u* as n — oo. Therefore 


Qn = | [u(an(t)) — u*]Jdt +0 as noo. 


Fr 


We also note that from x,(t) ¢ int D, t € E”, it follows that 


u(an(t)) dt < u* meas E”. 


En 
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Now we use Lemma, 7 and obtain 


ye i u(an(t)) dt 


kon n 
Ty Uw 


= u* meas [Ug(7 Uwe)] — / [u* — u(an(t))] dt — i 6°(an(t)) dt. 
UpQzR UpER 
We take a number 6 < c/2. Then there exists a number B> 0 such that 


Nw 


meas [Ux (Q% U EZ)] > BG. 


Then there exists a number @ > 0 for which 


ss i u(tn(t)) dt < u* meas [Uz (2 Uw?)] — 6. 


kon n 
Ty Uw) 


Therefore from (5.73) we have 


u(an(t)) dt < u* {meas [Ux (77 U w7)] + meas F” + meas F"} 
[ta ,t”] 


+ An — 8 
or 
u(z,(t)) dt < u*(t" —t1) +a, — fp. (5.74) 
[ta ,t”] 
From (5.71) we obtain 


/ u(a,(t)) dt 


[t2,T] 


< i: ulead TOY if u(a*(t))dt+Tn(6). (6.75) 


[ta ,T] [te ,T] 


Thus from (5.72)—(5.75) we have 
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u(a(t)) dt 
[t1,T] 
< U(@n(t)) dt + Tn(d) 
[t1,T] 
= i aula (iy aes: / sie) ab? i: u(ay(t)) dt + T(6) 
[t1,t”] [t” ,t2] [t2,T] 
<u*(t” —t)) +u*(te —t") 4 u(a*(t)) dt 
[t2,T] 
+ An — B+ An + 2T7() 
= u(x*(t)) dt ++ an — B+ An + 2T7(6) 
[t1,T] 
Here 
An = [u(an(t)) —u*|]dt + 0 as now, 


[t” ,t2] 


because t” — tz. We choose the numbers 6 > 0 and n such that the following 
inequality holds: 
Qn + An + 2T7(d) < B. 


In this case we have 


[ wetpyat < fuera 
[ta 7] [ta 7] 

and therefore 
i FON ees / u(x" (t)) dt, 
(0,T] {0,T] 


which means that z(t) is not optimal. This is a contradiction. 
Thus the theorem is proved. 
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Chapter 6 
Pontryagin principle with a PDE: 
a unified approach 


B. D. Craven 


Abstract A Pontryagin principle is obtained for a class of optimal control 
problems with dynamics described by a partial differential equation. The 
method, using Karush-Kuhn—Tucker necessary conditions for a mathematical 
program, is almost identical to that for ordinary differential equations. 


Key words: Optimal control, Pontryagin principle, partial differential equa- 
tion, Karush—Kuhn—Tucker conditions 


6.1 Introduction 


Pontryagin’s principle has been proved in at least four ways, for an optimal 
control problem in continuous time with dynamics described by an ordinary 
differential equation (ODE). One approach ([5], [6]) regards the control prob- 
lem as a mathematical program, and uses the Karush—Kuhn—Tucker (KKT) 
necessary conditions as the starting point (though with some different hy- 
potheses) for deriving the Pontryagin theory. There are various results for 
optimal control when the dynamics are described by a partial differential 
equation (PDE), often derived (as, for example, by Lions and Bensoussan) 
using variational inequalities, which are generally equivalent to mathemat- 
ical programs in infinite dimensions. The results in [{1]—[5], and others by 
the same authors, obtain some versions of Pontryagin’s principle by quite 
different methods to those used for ODEs. However, the Pontryagin theory 
involving a PDE can also be derived from the mathematical programming 
approach, using the KKT conditions, and replacing the time variable t by a 
space variable z, say in R? or R®, or by (t, z) combined. Whatever approach 
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is followed requires a good deal of detailed calculation, concerned with choice 
of function spaces (suitable Sobolev spaces), and proofs of differentiability 
properties. These details are omitted here (they are adequately treated for 
example in [1], [3]), since the aim here is to show that a Pontryagin principle 
readily follows. The results depend indeed on certain differentiability prop- 
erties, stated in what follows, but only indirectly on how these properties are 
achieved. 


6.2 Pontryagin for an ODE 


Consider first an optimal control problem with an ODE: 


MIN J(u): = F(a,u) := fs f(a ),t)dt subject to (6.1) 
x(0) = ao, &(t) = m(a(t), u(t), t) u(t) Ee F(t) (O<t<T). 


Here x(.) is the state function, u(.) is the control function, the time interval 
(0, 7] is fixed, f and m are differentiable functions. Other details such as 
variable horizon T, an endpoint constraint on x(T), and state constraints, can 
readily be added to the problem. They are omitted here, since the purpose 
is to show the method. The steps are as follows. 

(a) The problem (6.1) is expressed as a mathematical program: 


MINgexueud(u) := F(ax,u) subject to Da = M(a,u), we I, 


over suitable function spaces X and U; X is chosen so that the differential 
operator D := (d/dt) is a continuous linear mapping (see Note 1 in the 
Appendix). 

(b) Assume temporarily that F' and M are differentiable with respect to 
(x, u). Then necessary KKT conditions for a minimum at (x, u) = (%, %) are 


F,(z,@) + \(—Dza + M,(2, a) 
(F,(@,%) +\M,, ( 2,a))(" — a) > 


with a Lagrange multiplier De Represent A by a function X( .), where 
A T — 
vw € Cl0,T] <A, w> | A(t), w(t)dt. 
0 


Define the Hamiltonian 


h(x(t), u(t), t, A) = F(a(t), ut), 2) + A) m(a(t), u(t), #) 
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and 


. i 
A(a,u,) := F(a2,u) +AM (a, u) = | h(a(t), u(t), t, A(t) ) dt. 


In what follows, differentiability will be assumed only with respect to x, not 
u, so that (6.3) is not available. The multiplier \ remains, satisfying (6.2), 
provided that the operator —D + M,(, i) is assumed surjective. 

(c) Integrating the —\D term in (6.2) by parts leads to 


—D\ = (F + \M).2(@, i), 


if the integrated part vanishes. Choosing a boundary condition to do this, 
the adjoint differential equation is obtained: 


— A(t) = hz (Z(t), UE), t, A(E)), A(T) = 0. (6.4) 


(d) Assume that Da = M(a,u) defines x as a Lipschitz function of u, and 
that (see Note 2 in the Appendix) 


F(«,u) — F(Z, u) = F,(Z,u)(x — &) + O(||w — Z|] + |lu — af), (6.5) 


with a similar requirement for M. Then minimality of (Z,%), namely that 
F(a,u) — F(Z, %) > 0, with (6.2), leads (see [7], Theorem 7.2.3) to 


H(z, u, \)—H(2, a, ) = F(x, u)— F( Z,%)+O(|lu—a||) > O(Ju—ai]), (6.6) 
describing a quasimin (sce [6]) of H(z,.,) over I'(.) at u. (Note that there 
is no requirement of convexity on I'(.).) 

(ce) Assuming that @ is a minimum in terms of the LZ! norm, suppose if 
possible that 


(x(t), u(t), t, A(t)) < h((t), u(t), t, A(t)) 
for t in a set of positive measure. Then (see Note 3 in the Appendix) a set 
of control functions {ug (.) : 8 => 0} C I is constructed (see [7], Theorem 
7.2.6), for which 
for some constant c > 0, thus contradicting (6.6). (A required chattering 


property holds automatically for the considered control constraint.) This has 
proved Pontryagin’s principle, in the following form. 


Theorem 1. Let the control problem (6.1) reach a local minimum at (x, u) = 
(Z,u) with respect to the L'-norm for the control u. Assume that the differen- 
tial equation Du — M(a,u) determines x as a Lipschitz function of u, that the 
differentiability property (6.5) (with respect to x) holds, and that —Dy,(&, tt) 
is surjective, Then necessary conditions for the minimum are that the costate 
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X(.) satisfies the adjoint equation (6.4), and that h(z(t), .,t, A(t)) is minimized 
over I(t) at u(t), for almost all t. 


6.3 Pontryagin for an elliptic PDE 


Denote by 2 a closed bounded region in R® (or R?), with boundary 02, 
and disjoint sets A;(i = 1,2,3,4) whose union is 02. The optimal problem 
considered is: 


MIN a(.), u(.) J(u) = ‘ f(a(z), u(z), z)dz 

subject to 
(Vz € 22) Dzx(z) = m(a(z), u(z), 2), ) 
(Vz € Aj) x(z) = x9(z), ) 
(Vz € Az) (Va(z)).n(z) = go(z), 
(Vz € 2) u(z) € F(z). (6.9) 


Here D is an elliptic linear partial differential operator, such as the Lapla- 
cian V?, and n = n(z) denotes the outward-pointing unit normal vector to 
OQ at z € OM. The constraint on the control u(z) is specified in terms of 
a given set-valued function [’(z). The precise way in which 2(.) satisfies the 
PDE (6.8) need not be specified here; instead, some specific properties of the 
solution will be required. The function spaces must be chosen so that D is 
a continuous linear mapping. This holds, in particular, for D = V?, with 
Sobolev spaces, if « € Wé(Q) and u € Wg(). It is further required that 
(6.7) determines x(.) as a Lipschitz function of u(.). The boundary 0 of the 
region need only be smooth enough that Green’s theorem can be applied to 
it. 

The Hamiltonian is 


h(a(z), u(z), z,A(z)) = f(a(z), u(z), 2) + A(z) m(a(z), u(z), 2). (6.10) 


The steps of Section 6.2 are now applied, but replacing t € [0,7] by z € 2. It 
is observed that steps (a), (b), (d) and (e) remain valid — they do not depend 
ont € R. Step (c) requires a replacement for integration by parts. If D = V?, 
it is appropriate to use Green’s theorem in the form 


| AV22 — «V7 Aldu = i [A(Ox/On) — «(OX/On)|ds, 
Q aa 


in which dv and ds denote elements of volume and surface. The right side of 
(6.10) beomes the integrated part; the origin can be shifted, in the spaces of 
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functions, to move x(z) = X%o(z) to #(z) = 0, and a similar replacement for 
the normal component (Vz(z)).n(z); so the contributions to the integrated 
part from A, and Ag vanish already. The remaining contributions vanish if 
boundary conditions are imposed: 


A(z) = 0 on As; 0A/On = 0 on Ag (thus VA(z).n(z) =00n Ag). (6.11) 
Then (6.2) leads to the adjoint PDE 
D* (z) = Oh(2x(z), u(z), 2; A(z))/Ox(z), 


with boundary conditions (6.11), where D* denotes the adjoint linear oper- 
ator to D. Here, with D = V?, (6.10) shows that D* = V? also. Then (e), 
with z € 2 replacing t € [0,7], gives Pontryagin’s principle in the form: 
h(%(z),.,t, A(z)) is minimized over I'(z) at u(z), possibly except for a set of 
z of zero measure. 

If f and m happen to be linear in u, and if I’(z) is a polyhedron with 
vertices p; (or an interval if u(z) € R), then Pontryagin’s principle may lead 
to bang-bang control, namely u(z) = p; when z € E;, for some disjoint sets 
Boca. 


6.4 Pontryagin for a parabolic PDE 


Now consider a control problem with dynamics described by the PDE 
Ax (z,t)/It = 2V2a(2,t) + m(a(z,t), ulz,t), 2,2), 


where V2? acts on the variable z. Here t (for the ODE) has been replaced by 
(t, z) € [0,7] x Q, for a closed bounded region 2 C R., and where m(.) isa 
forcing function. Define the linear differential operator D := (0/0t) — ?V? . 
The function spaces must be chosen so that D is a continuous linear mapping. 
Define A; C 02 as in Section 6.3. The optimal control problem now becomes 
(with a certain choice of boundary conditions) 


T 
MIN2(),u(.) IF (u) = i) f(x(z, t), u(z, t), 2, t)dtdz 
0) ‘7 


subject to 
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Then steps (a), (b), (d) and (e) proceed as in Section 6.3, for an elliptic PDE. 
The Hamiltonian is 


h(a(z,t), u(z, t), 2, t, A(z, t)) 
= f(a(z,t), u(z,t), 2,6) + A(z, t)m(a(z,t), u(z, t), 2, 6). 


Step (c) (integration by parts) is replaced by the following (where I := [0,7] 
and 0 := (0/0t)) : 


—\Dzr = — f [Mendel nazar 
--f Az, t)[0 — V2]a(z, t)dedt 


Q 
=} def Ore.0)a (z, t)dt 
2 


i / eS t))a(z, t)dz, 


applying integration by parts to 6 and Green’s theorem to V2, provided 
that the “integrated parts” vanish. Since x(z,t) is given for t = 0, and for 
z€ A, UA» C ON, it suffices if 


(Vz)A(z,T) = 0, 
so that fo [A(z, t)x(z, t)]> = 0, and if 
(vt € [0, T]) A(z, t) = 0 on Ag; Va(z,t-n(z,t) =0 on Ag. 
With these boundary conditions, the adjoint PDE becomes 
—(0/08) (2,1) = 2 V2X(z, 2). 


Then (e), with (z,t) € 2x TJ replacing t € [0,T], gives Pontryagin’s principle 
in the form: 


h(%(z, t),., 2,t, A(z, t)) is minimized over I'(z) at u(z,t), 
possibly except for a set of (z,t) of zero measure. 


Concerning bang-bang control, a similar remark to that in Section 6.3 
applies here also. 


6.5 Appendix 


Note 1. The linear mapping D is continuous if x(.) is given a graph norm : 


Ill] = lal!’ + [|Dal, 
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where ||||*denotes a given norm, such as ||||.. or ||2||2. 

Note 2. It follows from Gronwall’s inequality that the mapping from u 
(with Zy norm) to x (with LZ. or Lz norm) is Lipschitz if m(.) satisfies a 
Lipschitz condition. The differentiability property (6.5) replaces the usual 
F,,(Z,t) by F,,.(Z,u). This holds (using the first mean value theorem) if f 
and m have bounded second derivatives. 

Similar results are conjectured for the case of partial differential equations. 

Note 3. The construction depends on the (local) minimum being reached 
when u has the L;-norm, and on the constraint (Vz)u(z) € I'(z) having the 
chattering property, that if uw and v are feasible controls, then w is a feasible 
control, defining w(z) = u(z) for z € 2; C Q and w(z) = v(z) for z € Q\N. 
For Section 6.4, substitute (z,t) for z here. 


Acknowledgments The author thanks two referees for pointing out ambiguities and 
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Chapter 7 


A turnpike property for discrete-time 
control systems in metric spaces 


Alexander J. Zaslavski 


Abstract In this work we study the structure of “approximate” solu- 
tions for a nonautonomous infinite dimensional discrete-time control sys- 
tem determined by a sequence of continuous functions vu; : X x X — Ri, 
i =0,+1,+2,... where X is a metric space. 


Key words: Discrete-time control system, metric space, turnpike property 


7.1 Introduction 


Let X be a metric space and let p(-,-) be the metric on X. For the set X x X 
we define a metric pi(-,-) by 


p1((@1, £2), (Yi, Y2)) = (1, y1) + P(@2, 42), 11, %2,Y1,y2 € X. 


Let Z be the set of all integers. Denote by M the set of all sequences 
of functions v = {v;}%2_,, where v; : X — R! is bounded from below for 
each i € Z. Such a sequence of functions {v;}92_,, € M will occasionally be 
denoted by a boldface v (similarly {u;}92_. will be denoted by u, etc.) 


The set M is equipped with the metric d defined by 


d(v,u) = sup{|v; (x,y) — ui(z,y)|: (v7, y) © X x X, i € Z}, (1.1) 


d(v,u) = d(v,u)(1+d(v,u))~!, uv eM. 


In this paper we investigate the structure of “approximate” solutions of 
the optimization problem 
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ko—1 
oo U;(2j,%i41) — min, (eee CX, &e, =Y, Lk, = 2 (P) 
i=ky 

where v = {v;}%_., € M, y,z © X and ke > ky are integers. 

The interest in these discrete-time optimal problems stems from the study 
of various optimization problems which can be reduced to this framework, 
for example, continuous-time control systems which are represented by ordi- 
nary differential equations whose cost integrand contains a discounting factor 
(see [Leizarowitz (1985)]), the infinite-horizon control problem of minimiz- 
ing fo Le, z'\dt as T — o (see [Leizarowitz (1989), Zaslavski (1996)]) and 
the analysis of a long slender bar of a polymeric material under tension in 
[Leizarowitz and Mizel (1989), Marcus and Zaslavski (1999)]. Similar opti- 
mization problems are also considered in mathematical economics (see 
[Dzalilov et al. (2001), Dzalilov et al. (1998), Makarov, Levin and Rubinov 
(1995), Makarov and Rubinov (1973), Mamedov and Pehlivan (2000), Mame- 
dov and Pehlivan (2001), McKenzie (1976), Radner (1961), Rubinov (1980), 
Rubinov (1984)]). Note that the problem (P) was studied in [Zaslavski (1995)] 
when X was a compact metric space and v; = vo for all integers 7. 

For each v € M, each m1,m2 € Zsuch that mz > m, and each z1, z2 © X 
set 


a(v,m1, Ma, 21, 22) oe 


mo2—-1 
ne ‘2 vi(zi, Ziti) > {Li}fn, CX) Im, = 21, Lmy = a . (1.2) 


i=M1 


If the space of states X is compact and v; is continuous for all integers 
i, then the problem (P) has a solution for each y,z € X and each pair of 
integers kp > k,. For the noncompact space X the existence of solutions 
of the problem (P) is not guaranteed and in this situation we consider 6- 
approximate solutions. 

Let v € M, y,z © X, ko > ky be integers and let 6 be a positive number. 
We say that a sequence tee. Cc X satisfying v,, = y, th, = zisa 
d-approximate solution of the problem (P) if 


ko-1 
So vilai, tits) S ov, hi, ho, y, 2) +6. 


i=ky 


In this chapter we study the structure of d-approximate solutions of the 
problem (P). 

Definition: Let v = {v;}2_,, © M and {z;}2_,, C X. We say that v 
has the turnpike property (TP) and {%;}?2_.. is the turnpike for v if for each 
€ > O there exist 6 > 0 and a natural number N such that for each pair of 


integers m1, mg satisfying mz > m1 +2N and each sequence {2;}j"",,, C X 
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satisfying 
mo2—-1 
oa 0; Gaya) < OLY A119, Cin Bris) oe 5 


I=M) 


there exist 7, € {mj,...,m1 + N} and 72 € {m2 — N,...,me2} such that 
p(a;,%;) <6, t= 71,...,72. 


Moreover, if p(%m,,Fm,) < 6, then 7, = m4, and if p(a%m,,%m,) < 6, then 
T2 = Mog. 

This property was studied in [Zaslavski (2000)] for sequences of functions 
v which satisfy certain uniform boundedness and uniform continuity assump- 
tions. We showed that a generic v has the turnpike property. 

The turnpike property is very important for applications. Suppose that a 
sequence of cost functions v € M has the turnpike property and we know 
a finite number of approximate solutions of the problem (P). Then we know 
the turnpike {%;}%2_.., or at least its approximation, and the constant N 
which is an estimate for the time period required to reach the turnpike. This 
information can be useful if we need to find an “approximate” solution of the 
problem (P) with a new time interval [k,k2] and the new values y,z © X 
at the end points k, and kz. Namely instead of solving this new problem 
on the “large” interval [k,,k2] we can find an “approximate” solution of the 
problem (P) on the “small” interval [k1,k, + N] with the values y, Z,,4N 
at the end points and an approximate solution of the problem (P) on the 
“small” interval [kg — N,k2] with the values %,,-N,z at the end points. 
Then the concatenation of the first solution, the sequence toe. and the 
second solution is an approximate solution of the problem (P) on the interval 
(k1, ko] with the values y, z at the end points. Sometimes as an “approximate” 
solution of the problem (P) we can choose any sequence {a;}*2 ;, Satisfying 


Lk, =Y, Ck = % and x; = X; for alli =k + N,..., ko —N. 


This sequence is a 6-approximate solution where the constant 6 does not 
depend on ki, kz and y, z. The constant 6 is not necessarily a “small” number 
but it may be sufficient for practical needs especially if the length of the 
interval [k1, kg] is large. 

The turnpike property is well known in mathematical economics. The term 
was first coined by Samuelson in 1948 (see [Samuelson (1965)]) where he 
showed that an efficient expanding economy would spend most of the time in 
the vicinity of a balanced equilibrium path (also called a von Neumann path). 
This property was further investigated in [Dzalilov et al. (2001), Dzalilov et al. 
(1998), Makarov, Levin and Rubinov (1995), Makarov and Rubinov (1973), 
Mamedov and Pehlivan (2000), Mamedov and Pehlivan (2001), McKenzie 
(1976), Radner (1961), Rubinov (1980), Rubinov (1984)] for optimal trajec- 
tories of models of economic dynamics. 
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The chapter is organized as follows. In Section 2 we study the stability 
of the turnpike phenomenon. In Section 3 we show that if {%;}S2_., is the 
turnpike for v = {u;}%_., € M and v; is continuous a each integer i, 
then for each pair of integers ky > ky the sequence {z;}*2 =p, 18 a solution 
of the problem (P) with y = %,, and z = Z,,. In Section 4 we show that 
under certain assumptions the turnpike property is equivalent to its weakened 


version. 


7.2 Stability of the turnpike phenomenon 


In this section we prove the following result. 


Theorem 1. Assume that v = {v;}2_., € M has the turnpike property and 
{Zi }2_., C X is the turnpike for v. Then the following property holds: 

For each € > 0 there exist 6 > 0, a natural number N and a neighborhood 
U of v in M such that for eachu € U, on pair of integers m1, Mz satisfying 
m2 >m,+2N and each sequence {a,}™ CX satisfying 


sik 


m2—-1 
Se CAC ae ota) < O(N MI Wine Brg) +O (2.1) 


1=M1 

there exist T, € {my,...,m, +N} and T2 € {mg —N,...,m2} such that 

p(a;,%;) <6, t= 71,..., 72. (2.2) 
Moreover, if p(@m,,Em,) < 6, then 7 = m4, and if p(amz,Fm,) < 5, then 
T2 = M2. 
Proof. Let € > 0. It follows from the property (TP) that there exist 

do € (0, €/4) (2:3) 

and a natural number No such that the following property holds: 


(P }) for each pair of integers m,,m2 > m, + 2No and each sequence 
{ri}? 7, CX satisfying 


mo2—-1 


ys Ui (@i, Li41) < o(V,M1, M2, Lm,,Lm,) +450 (2.4) 


i=M1, 


there exist 7 € {m1,...,7m1 + No}, 72 © {m2 — No, me} such that (2.2 
holds. Moreover, if p(am,; Zm,) < 400, then 7; = my, and if P(LmzLm.) < 46 


then T2 = Mm. 
It follows from the property (TP) that there exist 


6 € (0, 59/4) (2.5) 
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and a natural number N; such that the follwing property holds: 
(P2) For each pair of integers m,,m2 > m1 + 2N; and each sequence 
{x;}7",,, © X which satisfies 


i=M1 


mo2—-1 


> Vi(Xi, Fig1) S O(V, M1, M2, Lm, ,Lm,) + 46 


iI=mM, 
there exist 7, € {my,,...,m1, + Ny}, 72 € {m2 — Nj,...,me2} such that 
P(xi, Fi) < do, LE {T3023 Toh. (2.6) 


Set 
N =4(N, + No) (2.7) 


and 
U={ueM: |ui(x,y) — v;(x,y)| < (8N)~16, (x,y) EX x X}. (2.8) 


Assume that u € U, my1,™2 € Z, m2 >mMy4 +2N, 


m2—-1 
{xi}i?,, C X and dE Ui(%j,it1) < o(U,MI1,M2,%m,,L%m,) +4. (2.9) 
I=mM 
Let 
k € {m,...,mg}, Mz —k > 2N. (2.10) 
(2.9) implies that 
k+2N-1 
2 Us(@i, i441) <o(u,k,k+2N, xx, Up42n) + 0. (2.11) 
ixk 


By (2.11), (2.7) and (2.8), 


k4+2N—1 k+2N—1 
SE ite tal SH vi (4, i41)| < 6/4, 
i=k i=k 


|o(u, k,k +2N, Lp; Lp+2N) _ a(v,k, k+ 2N, Le; Te+2N)| < 6/4 


and 
k+2N-1 k+2N-1 
LS U;(@i, igi) < x Ui(U;, Vig1) + 6/4 
i=k i=k 
<6/4+o(u,k,k+2N, xp, 2~42n) +6 
<o(v,k,k+2N, ax, tp42n) +6+6/44+6/4. (2.12) 
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We have that (2.12) holds for any k satisfying (2.10). This fact implies 
that 


ko-1 
x U;(Xi, Li41) < a(v, ky, ko, Lk, Lk) + ge4 7 36 (2.13) 
i=ky 

for each pair of integers ky, ko € {m1,...,me2} such that 


ky < ko <ky +2N. 


It follows from (2.13), (2.7) and the property (P2) that for any integer k € 
{my1,..., mg} satisfying my —k > 2No + 2.N, there exists an integer q such 
that qi k iS [2No, 2No + 2Ni] and 


P(Xq,%q) < 40. 


This fact implies that there exists a finite strictly increasing sequence of 
integers {7;}j—9 such that 


P(@r;,Fr;) < 60, j=0,...,8, (2.14) 
my <7 <2No+2N, + m1, if p(Smy,0mi) < 60, then To =™1, (2.15) 
T4417; € [2No,2No + 24), 9 = 0.0458 —1 (2.16) 


and 
mz —2No —2Ny < Ts < mz. (2.17) 


It follows from (2.13), (2.7), (2.5), (2.14) and (2.16) that for 7 =0,...,s—1 
p(ti, Zi) Se, 1 © {75,-.., 7741}. 


Thus p(x, Zi) < €, 7 € {1,...,7s} with 7 < m4 N, tT, > mo —N. By 
(2.15) if P(Lm,;Em,) < 69, then tT = m4. 
Assume that 
P(Lmo;Ems) < do. (2.18) 


To complete the proof of the theorem it is sufficient to show that 
p(u;,%i) < 6, 1 © {T.,..., Me}. 
By (2.17) and (2.16) 


M2 —Ts—1 = M2 — Ts + Ts — Ts—1 © [2N0,4No + 4Nj]. (2.19) 


By (2.13), (2.19) and (2.7), 


mo2—-1 
S- Vi(2i, 0541) < O(V, Ts—1, M2, Er,_1)2m_) + 2-1 + 36. (2.20) 


I=Te a7 
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It follows from (2.19), (2.20), (2.18), (2.14) and the property (P1) that 
p(%;, 24) <., t= Ts—1)++-, Mg. 


Theorem 2.1 is proved. 


7.3 A turnpike is a solution of the problem (P) 


In this section we show that if {%;}S2_., is the turnpike for v = {v;}2_., € 
M and v; is continuous for each integer 7, then for each pair of integers 
kg > ky the sequence eee is a solution of the problem (P) with y = %x, 
and z = Z,,. We prove the following result. 


Theorem 2. Let v = {vi}?2_. € M, and {2} C X. Assume that v; 
is continuous for alli € Z, v has the turnpike property and {%;}%., is the 
turnpike for v. Then for each pair of integers m,,mz > mz}, 


mo2—-1 


PS v4 (Zi, Zi41) = O(V, M1, M2, Fm, Em): 


iI=mM, 


Proof. Assume the contrary. Then there exist a pair of integers m1, M2 > m1, 


a sequence {x;}}"%,,, and a number A > 0 such that 


=m 


Im, =m, Ling =Ting, 
mg—-1 m2—1 
.S U;j(Li, Lisi) < S U; (Za, Zi41) —A. (3.1) 
in (a 


There exists € € (0, A/4) such that the following property holds: 
(P3) ifie {my = 1,...,Me2 a 1}, 21,272 € X and 


(21, £4), (22, Fit1) <€, 


then 
\u; (Zi, Bi41) = u;(21, 22)| < Al[64(172 = M4 + 4)\-1. (3.2) 


By the property (TP) there exist 6 € (0,¢/4) and a natural number N such 
that for each pair of integers n1,n2 >n1+2N and each sequence {y;}¥2,,. C 


zeny 
X satisfying 
n2—-1 
s Vi(Yis Yi+1) < O(V,N1,N2,Yn15Yn2) + 4 (3.3) 
I=Nn1 


the following inequality holds: 


PY» Ti) Se, t= +N,...,n2—N. (3.4) 
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There exists ge, n C X such that 


Ymi—-4N = Im,—4N; Ymot4N = Umy+4N; (3.5) 


and 


Mo+4N 
Se ViVi, Yit1) < a(v, mi — AN, me + AN, Ym —4N; Ymo+4n) + 6/8. 
i=m,—4N 


(3.6) 
By (8.5), (3.6) and the definition of 6, N (see (3.3) and (3.4)) 
P(Hi,E;) <6, t= m1 —3N,...,M2+3N. (3.7) 
Define ye ag Cc X by 
Yi = Yi, 1€ {mm —AN,...,m, —1}U{me41,...,m2+4N}, (3.8) 


Yi = Bi; 0-€ {im1,..., Ma}. 
We will estimate 


mo2+t4N-1 m2+t4N-1 


ys ViVi Fir) — os Vi(Yis Yi41): 


i=m,—4N i=m,—-4N 
y (3.8) and (3.1), 


m2+4N-1 mg t4N-1 
Ss Vi(Yis Yi41) — a Ui(Yis Yi+1) (3.9) 
i=m,—4N i=m,—4N 
m2 


= | ‘3 [ui(Gi, Viti) — Ui Yas Yi+1)] 


i=m,—-1 
= Vm —1(Ymi-1, Yin) vs Um1—1(Yrnr—-1 Ym) + Um (Ym Ymot1) 
mgat 
= Vins (Yin2) Ym241) + SS [vi(Gis Vi+1) — Vilas Yi4a)] 
I=M1, 
= Vm —1(Ymi—-1, Yin) = Um1—1(Yrnr—-1 Tm) + Um (Yme ) Ymnot1) 
m2—1 
= Uma (Ems Jm2+1) + oe [viGi, Vir) — vi(Zi, Fi41)] 
I=M] 


sy vj (Zi, Zi41) — Vil ei, Vi41)). 
L=M1, 


y (3.7) and the property (P3), 
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|v, -1(Ym4—-1, Ym) _ Um, —1(Ym4-1, Im, )| < 2A[64(mz2 —m +4))7*, (3.10) 


Vin. (Yrs Ymo+1) — Ume (Zine, Yn2+1)| < 2A[64(mz2 —m + 4)\~*, (3.11) 


|vs(Gis Gera) — vi(Zs, Bis1)| < A[64 (M2 — M1 + 4))-*, d= 0M,...,M2—-1. 
(3.12) 
It follows from (3.9), (3.12), (3.11) and (3.1) that 


m2+4N-1 
[vi (Yi, Yi+1) — Vili, Yi41)) 
i=m,—4N 
> —A[64(m2 — m1 + 4)" (M2 — m1 + 4) 
M-1 
a > [wpe ea asaya) 
i=m1 


> A A/64 > 26. 


Combined with (3.8) this fact contradicts (3.6). The contradiction we have 
reached proves the theorem. 


7.4 A turnpike result 


In this section we show that under certain assumptions the turnpike property 
is equivalent to its weakened version. 


Theorem 3. Let v = {u;}* €M, {@:}2_. CX and 


mo2—-1 
2 U; (Zi, Fi41) => o(v,™1,M2,Em,;Fmy) (4.1) 


i=M1 


for each pair of integers m1,m2z > my. Assume that the following two prop- 
erties hold: 

(i) For any € > 0 there exists 6 > 0 such that for each i € Z, each 
1,2, ¥1,y2 © X satisfying p(x;,y;) <4, j = 1,2, 


|u;(@1, 2) — vilyi,y2)| < 6 


(wt) for each € > 0 there exist 6 > 0 and a natural number N such that for 
each pair of integers m,, mz > m,+2N and each sequence {x}, C X 
satisfying 
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mo-1 
S- U;(4i, Vig) < O(V, M1, M2, Fm,,Em.) +4 (4.2) 
i=m 
the inequality 
p(a;,%;) <6, i= m+ N,...,m2-N 


holds. Then v has the property (TP) and {%;}% is the turnpike for v. 


1=—0o 


Proof. We show that the following property holds: 
(C) Let « > 0. Then there exists 6 > 0 puch that for each integer m, each 
natural number / and each sequence fa, }*t i” CX satisfying 


p(t, Fi) <6, i =m, m+k (4.3) 


m+k—-1 
os Vi (254, %i41) S o(v,m, M+ k, Lm, Lm+n) +4 
the inequality 
p(a;,%;) <6, i=m,...,m+k (4.4) 
holds. 
Let € > 0. There exists €9 € (0,¢€/2) and a natural number N such that for 


each pair of integers m1, m2 > m+ 2N and each sequence {x;}*" CX 


i= M1 
satisfying 
13 1 


SS U; (Xi, Li41) < a(v, m1, m2, my ’ Lins.) Els 2€0 
I=M 1 
the inequality 
P(xj,%;) <6, i =m, + N,...,m2-—N (4.5) 


holds. 

By the property (i) there is 6 € (0,¢0/8) such that for each integer i 
and each 21, 22,y1,y2 € X satisfying p(z;,y;) < 6, 7 = 1,2 the following 
inequality holds: 


|vi(z1, 22) — vi(91, ya)| < €0/16. (4.6) 


Assume that m is an integer, k is a natural number and a sequence {x;}7""* 


Cc X satisfies (4.3). We will show that (4.4) is true. Clearly we may assume 
without loss of generality that k > 1. 
Define a sequene {2;})" pet Cc X by 


4=4j,ti=mm+k, 4% =F, ie {m,...,.m+k}\{m,m+k}. (4.7) 


(4.3) and (4.7) imply that 
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mt+k—-1 


> vi (xi, Li41) 


=m 


m+k—-1 


a Se Vil Zi, Zi41) +6 


< }6 + x U;(%i, Fi41) + \Um(Zm, Em41) —_ Um(@m;Em+41)| 


i=m 
Ute Pees ete) > Umea eee eee: 


Combined with (4.3) and the definition of 6 (see (4.6)) this inequality 
implies that 


m+k—-1 m+k—-1 
ye, U;(%;,Li41) <b + e9/8+ S- U;(Z;, Ei41)- (4.8) 


mt+2N+k 
Define {yi} t?Nt* CX by 
Y= 2;,,1E€{m—-2N,...,m—-lsU{m+k4+1,...,m+k+2N}, (4.9) 


Yi = Ui, 1E {m,...,m+k}. 
It follows from (4.9), (4.3) and the definition of 6 (see (4.6)) that 


|Um—1(Em—1; Em) = Um—1(Ym—1; Ym)| < €9/16 


and 
|Um+ke(Emtks Em+k4i) — Umte(Ym+ks Ymte+1)| < €0/16. 


Combined with (4.9) and (4.8) these inequalities imply that 


m+k 2 7 é 7 7 
S vilYi, Yi+1) < Om —1 (B13 Em) + 16 Um+k(Lm+k; Em+k+1) 
t=m-1 


Z m+k—-1 
+ i + De U;(i, Li41) 
i=m 


< €0/8 + Up AAT Die) a Vek Dies Det hil) +6 
m+k—-1 
+ €9/8+ > U;(%;i, Bi41) 
m+k 
< €9/2 + » U;(Z;, Fi41). (4.10) 


t=m-1 
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By (4.9), (4.10) and (4.1) 


m+k+2N—-1 
SS vi(yis vert) 
i=m—-2N 
m—2 mtk m+k+2N—-1 
= viv vir) + D> vivsverr)+ D> vilyis yess) 
i=m—2N i=m—-1 i=m+k41 
m—2 m+k 
< 0; (%5, Zita) + 9/2 + > OE Teta) 
i=m—2N t=m-1 
m+k+2N—-1 
+ a U; (Zi, Bi4i) 
i=mt+k+1 
m+k+2N—-1 
= €9/2 a by U;(Z;, Fi41) 
i=m—2N 


= €9/2+ o(v,m—2N,m+k+2N,Fm_on, Em+e42N) 


= €9/2 + o(v,m—2N,m+k+2N, ym—2Nn, Ym+k+2N)- 


Thus 


m+k+2N—-1 
S > vilyis yig1) S €0/2 + 0(v,m—2N, mt kt 2N, Ym—20Ns Ym+h+2N): 
i=m—2N 


By this inequality and the definition of €9 (see (4.5)) 


plyi, Bi) 6, t= M—N,...,mtk+N. 


Together with (4.9) this implies that 


Thus we have shown that the property (C) holds. Now we are ready to com- 
plete the proof. 

Let ¢€ > 0. By the property (C) there exists do € (0,¢) such that for each 
pair of integers m1, m2 > my, and each sequence eo eae Cc X satisfying 


p(x;, XZ) < 605 i m4, M2 


mog—-1 


> 07; (@j,ti41) < a(V,M1,™M2,%m,;Lms) oP do (4.11) 


i=mM1 
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the following inequality holds: 
p(a;,%;) <6, t= my4,...,mMe (4.12) 


There exist a number 6 € (0,49) and a natural number N such that for each 
pair of integers m ,m2 > mj,+2N and each sequence {2;}/,,, C X which 


i=M1 
satisfies 
m2—-1 


a Ui (Was Mr) < aly, M1,M2,Lm,,Lmzy) + }5 (4.13) 


i=M, 


the following inequality holds: 
P(x;,%i) < 00, t= m+ N,...,m_g—N. (4.14) 


Let m1,m2 > m,+2N be a pair of integers and {2}, C X satisfy 


(4.13). Then (4.14) is valid. Assume that p(am,,Zm,) < 6. Then by (4.14) 
and (4.13), 


P(Lm,4N; Em,+N) < d0 


and 
m+N-1 


yo U;(i, ig) < oO(v,m,m14+ N,tm,,%m,4Nn) + 6. 


i=mM, 


It follows from these relations and the definition of 59 (see (4.11) and (4.12)) 
that 
P(x;,%;) <6, t= m,,...,m, +N. 


Analogously we can show that is p(@m,.,%m,.) < 6, then 


(Xi, £4) <¢, t= m2—N,...,mM2. 


This completes the proof of the theorem. 


References 


Daalilov et al. (2001)] Dzalilov, Z., Ivanov, A.F. and Rubinov, A.M. (2001) Difference 
inclusions with delay of economic growth, Dynam. Systems Appl., Vol. 10, pp. 283-293. 
Daalilov et al. (1998)] Dzalilov, Z., Rubinov, A.M. and Kloeden, P.E. (1998) Lyapunov 
sequences and a turnpike theorem without convexity, Set-Valued Analysis, Vol. 6, 
pp. 277-302. 

Leizarowitz (1985)] Leizarowitz, A. (1985) Infinite horizon autonomous systems with un- 
bounded cost, Appl. Math. and Opt., Vol. 13, pp. 19-43. 

Leizarowitz (1989)] Leizarowitz, A. (1989) Optimal trajectories on infinite horizon de- 
terministic control systems, Appl. Math. and Opt., Vol. 19, pp. 11-32. 

Leizarowitz and Mizel (1989)] Leizarowitz, A. and Mizel, V.J. (1989) One dimensional 
infinite horizon variational problems arising in continuum mechanics, Arch. Rational 
Mech. Anal., Vol. 106, pp. 161-194. 


156 A.J. Zaslavski 


Makarov, Levin and Rubinov (1995)] Makarov, V.L, Levin, M.J. and Rubinov, A.M. 
(1995) Mathematical economic theory: pure and mixed types of economic mechanisms, 
North-Holland, Amsterdam. 

Makarov and Rubinov (1973)] Makarov, V.L. and Rubinov, A.M. (1973) Mathematical 
theory of economic dynamics and equilibria, Nauka, Moscow, English trans. (1977): 
Springer-Verlag, New York. 

Mamedov and Pehlivan (2000)] Mamedov, M.A. and Pehlivan, S. (2000) Statistical con- 
vergence of optimal paths, Math. Japon., Vol. 52, pp. 51-55. 

Mamedov and Pehlivan (2001)] Mamedov, M.A. and Pehlivan, S. (2001) Statistical clus- 
ter points and turnpike theorem in nonconvex problems, J. Math. Anal. Appl., Vol. 256, 
pp. 686-693. 

Marcus and Zaslavski (1999)] Marcus, M. and Zaslavski, A.J. (1999) The structure of 
extremals of a class of second order variational problems, Ann. Inst. H. Poincare, Anal. 
non lineare, Vol. 16, pp. 593-629. 

McKenzie (1976)] McKenzie, L.W. (1976) Turnpike theory, Econometrica, Vol. 44, 
pp. 841-866. 

Radner (1961)] Radner, R. (1961) Path of economic growth that are optimal with regard 
only to final states; a turnpike theorem, Rev. Econom. Stud., Vol. 28, pp. 98-104. 
Rubinov (1980)] Rubinov, A.M. (1980) Superlinear multivalued mappings and their ap- 
plications to problems of mathematical economics, Nauka, Leningrad. 

Rubinov (1984)] Rubinov, A.M. (1984) Economic dynamics, J. Soviet Math., Vol. 26, 
pp. 1975-2012. 
Samuelson (1965)] Samuelson, P.A. (1965) A catenary turnpike theorem involving con- 
sumption and the golden rule, American Economic Review, Vol. 55, pp. 486-496. 
Zaslavski (1995)| Zaslavski, A.J. (1995) Optimal programs on infinite horizon, 1 and 2, 
SIAM Journal on Control and Optimization, Vol. 33, pp. 1643-1686. 

Zaslavski (1996)] Zaslavski, A.J. (1996) Dynamic properties of optimal solutions of 
variational problems, Nonlinear Analysis: Theory, Methods and Applications, Vol. 27, 
pp. 895-932. 
Zaslavski (2000)] Zaslavski, A.J. (2000) Turnpike theorem for nonautonomous infinite 
dimensional discrete-time control systems, Optimization, Vol. 48, pp. 69-92. 


Chapter 8 
Mond-—Weir Duality 


B. Mond 


Abstract Consider the nonlinear programming problem to minimize f (2) 
subject to g(a) < 0. The initial dual to this problem given by Wolfe required 
that all the functions be convex. Since that time there have been many ex- 
tensions that allowed the weakening of the convexity conditions. These gen- 
eralizations include pseudo- and quasi-convexity, invexity, and second order 
convexity. Another approach is that of Mond and Weir who modified the 
dual problem so as to weaken the convexity requirements. Here we summa- 
rize and compare some of these different approaches. It will also be pointed 
out how the two different dual problems (those of Wolfe and Mond—Weir) 
can be combined. Some applications, particularly to fractional programming, 
will be discussed. 


Key words: Mond—Weir dual, linear programming, Wolfe dual 


8.1 Preliminaries 


One of the most interesting and useful aspects of linear programming is du- 
ality theory. Thus to the problem minimize c'x subject to Ar > b,x > 0 
(when A is an m x n matrix) there corresponds the dual problem maximize 
b'y subject to Aly < c, y > 0. Duality theory says that for any feasible x 
and y, ca > b'y; and, if xo is optimal for the primal problem, there exists 
an optimal yo of the dual and c’xo = b'yo. 
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The first extension of this duality theory was to quadratic programming. In 
[4], Dorn considered the quadratic program with linear constraints minimize 
su'Cx + p'x subject to Ar > b, x > 0, where C is a positive semi-definite 
matrix. Dorn [4] showed that this problem was dual to the quadratic problem 
maximize —5u'Cu+ b'y subject to Aty < Cu+p, y > 0. Weak duality holds 
and if xo is optimal for the primal, there exists yo such that (u = xo, yo) 
is optimal for the dual with equality of objective functions. The require- 
ment that C' be positive semi-definite ensures that the objective function is 
convex. 


8.2 Convexity and Wolfe duality 


Let f be a function from R” into R and 0 < A < 1. f is said to be convex if 
for allaz,ye€ R”, 


fy + (L— Aja) S AF(y) + 1 — A) F(a). 


If f is differentiable, this is equivalent to 


f(y) — f(z) = y—2)'Vf (2). (8.1) 


Although there are other characterizations of convex functions, we shall as- 
sume that all functions have a continuous derivative and shall use the de- 
scription (21.1) for convex functions. 


The extension of duality theory to convex non-linear programming prob- 
lems (with convex constraints) was first given by Wolfe [17]. He considered 
the problem 

(P) minimize f(x) subject to g(x) <0, 

and proposed the dual 

(WD) maximize f(u) + y*g(u) 

subject to Vf(u) + Vy'g(u) = 0, y > 0. 

Assuming f and g are convex, weak duality holds since, for feasible x and 
(u,y) 


f(z) — f(u) — y’g(u) > (a — u)'VF(u) — y’g(u) 
= —(x —u)'Vy'g(u) — y'g(u) > -y*g(x) = 0. 


He also showed that if xo is optimal for (P) and a constraint qualification is 
satisfied, then there exists yo such that (u = xo, yo) is optimal for (WD) and 
for these optimal vectors the objective functions are equal. 
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8.3 Fractional programming and some extensions 
of convexity 


Simultaneous to the development of convex programming, there was consid- 
eration of the fractional programming problem 

(FP) minimize oe (g(x) > 0) subject to h(x) < 0, 
and in particular, the linear fractional programming problem 

(LFP) minimize G3 (d'x+ 6) >0 
subject to Ax > b,x > 0. 

It was noted (see, e.g., Martos [9]) that many of the features of linear 
programming, such as duality and the simplex method, are easily adapted to 
linear fractional programming, although the objective function is not convex. 

This led to consideration of weaker than convexity conditions. Mangasar- 


ian [6] defined pseudo-convex functions that satisfy 


(y—2)'V f(z) >0 => f(y) — f(z) >0. 


Also, useful are quasi-convex functions that satisfy 


f(y) — f(x) <0 => (y—2)'Vf(z) <0. 


It can be shown [6] that if f is convex, > 0 and g concave, > 0 (or f convex, g 
linear > 0) then f/g is pseudo-convex. It follows from this that the objective 
function in the (LFP) is pseudo-convex. 

Mangasarian [7] points out that whereas some results (such as sufficiency 
and converse duality) hold if, in (P), f is only pseudo-convex and g quasi- 
convex, Wolfe duality does not hold for such functions. One example is the 
following: 

minimize «° +2 subject to x > 1, which has the optimal value 2 at x = 1. 
The Wolfe dual 


maximize u® + ut y(1—1u) 


3u7+1-y=0, y>0 


can be shown to — +00 as u — —o0. 

One of the reasons that in Wolfe duality, convexity cannot be weakened 
to pseudo-convexity is because, unlike for convex functions, the sum of two 
pseudo-convex functions need not be pseudo-convex. It is easy to see, how- 
ever, that duality between (P) and the Wolfe dual (WD), does hold if the 
Lagrangian f + y'g (y > 0) is pseudo-convex. We show that weak duality 
holds. 

Since V[f(u) + y‘g(u)] = 0, we have 


(x —u)'V[f(u) + y'g(u)] 20 => f(x) + y'9(z) = flu) + y'9(u) 


Now since y'g(x) <0, f(x) > f(u) + y*g(u). 
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8.4 Mond—Weir dual 


In order to weaken the convexity requirements, Mond and Weir [12] proposed 
a different dual to (P). 
(MWD) maximize f(u) 
subject to Vf(u) + Vy'g(u) = 0, 
y'g(u) > 0, y>0. 


Theorem 1 (Weak duality). If f is pseudo-convex and y'g is quasi-convez, 
then 


f(z) = flu). 
Proof. y'g(x) — y'g(u) <0 => (x—u)'Vg(u) < 0 
 -(4—u)tVf(u) <0 or 
(x—u)'Vf(u) 20 => f(z) = flu). 


It is easy to see that if also v9 is optimal for (P) and a constraint qualifi- 
cation is satisfied, then there exists a yo such that (u = 20, yo) is optimal for 
(MWD) with equality of objective functions. 

Consider again the problem minimize x° + x subject to 2 > 1, to which 
Wolfe duality does not apply. The corresponding Mond—Weir dual maximize 
u> + u subject to 3u2+1—y=0, y(1—u) > 0, y > 0, has an optimal at 
u=1,y =4 with optimum value equal to 2. 

Although many variants of (MWD) are possible (see [12]), we give a dual 
that can be regarded as a combination of (WD) and (MWD). Let M = 
{1,2,...,m} and ICM. 


max f(u) + >» yigi(u) 
ier 
Vf(u) + Vy'g(u) =0, y= 0 


> vigi(u) > 0. 


i€M/I 


Weak duality holds if f + 3 yigi is pseudo-convex and SS Yigi iS quasi- 


i€l i€M/I 
convex. 


8.5 Applications 


Since Mond—Weir duality holds when the objective function is pseudo-convex 
but not convex, it is natural to apply the duality results to the fractional 
programming problem (FP). Thus if f > 0, convex and g > 0, concave, 
yh quasi-convex then weak duality holds between (FP) and the following 
problem: 
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max Ff (u)/g(u) 
subject to Vi f(u)/g(u) + y’h(u)| = 0 
yh(u) 20, y>0. 


Other fractional programming duals can be found in Bector [1] and 
Schaible [15, 16]. 
Instead of the problem (FP), Bector [1] considered the equivalent problem 


minimize  f(a)/g(a) 
subject to ee) <0. 
g(x) 


f(x)+y"h(« 
g(x) 


Here the Lagrangian is ) and is psuedo-convex if f and h are 


convex, g is concave > 0, f + y’h > 0 (unless g is linear). Thus his dual to 
(FP) is 


f(u) + y'h(u) 

g(u) 

f(u) + yth(u)\ _ 
au) ) e 

f(u)+y'h(u) >0, y>0. 


maximize 


subject to V ( 


Duality holds if f is convex, g is concave > 0, and h is convex. 

A dual that combines the fractional dual of Mond—Weir and that of Bector 
can be found in [13]. 

Schaible [15, 16] gave the following dual to (FP): 


maximize i 
Vif(u) — AVg(u) + Vy'h(u) = 0 
f(u) — Ag(u) + y"h(u) = 0 
y>0, AO. 

Duality holds if f is convex, > 0, g concave, > 0 and h is convex. 
A Mond-Weir version of the Schaible dual is the following: 

maximize 
Vf(u) — AVg(u) + Vy'th(u) = 0 
f(u) — Ag(u) 2 0 
yh(u)>0, ASO, y>O. 


Here duality holds if f is convex and nonnegative, g concave and strictly 
positive, and y’h is quasiconvex. 
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A dual that is a combination of the last two is the following: 


maximize \ 
Vf(u) — AVg(u) + Vy'th(u) = 0 
f(u) — Ag(u) + Yo yihi(u) > 0 
i€l 


S> yihi(u) > 0, A>0, y > 0 
i€M/I 


Here D yih;(u) need only be quasi-convex for duality to hold. 
i€M/I 
A fractional programming problem where Bector and Schaible duality do 
not hold but the Mond—Weir fractional programming duals are applicable is 
the following: 


minimize —— subject to 23> 1. 
x2> x 
Here neither the Bector nor the Schaible dual is applicable. The Mond-— 
Weir Bector type dual is 
1 
maximize — — 
u>0 U 
1 
y= aq yl—u®) 20, y 20. 
3u4 


The maximum value —1 is attained at u=1, y = 4 


a 
The Mond-—Weir Schaible type dual is 


maximize 2 
subject to — A — 3yu? = 0 
—1l—-Au>0 
y(l—u*) 20, y20 


The maximum is attained at 


w=1, y=1/3, A=-1. 


8.6 Second order duality 
Mangasarian [8] proposed the following second order dual to (P). 


(MD) — maximize f(u) + y'g9(u) — 5p" [V2f(u) + V2y'g(u)] p 


Vy'g(u) + V7y'g(u)p + Vf (u) + V7 f(u)p = 0 
y 20 
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In [11] Mond established weak duality between (P) and (MD) under the 
following conditions: If for all x, u,p 


1 
F(x) — f(u) 2 (@— w)'V E(u) + @— u)'V? f(u)p — sp'V" Fup 
(subsequently called second order convex by Mahajan [5]) and 
1 
gi(a) — gi(u) > (@ —u)'Vgi(u) + (@ — u)'V"Gi(u)p — sp'V"9:(u)p 
= Dm 
The second order convexity requirements can be weakened by suitably mod- 
ifying the dual. The corresponding dual is 
(MWSD) max f(u) — $p°V?f(u)p 
Vy'g(u) + V7y'g(u)p + Vf(u) + V7 f(u)p = 0 
1 
y'g(u) — sp'lV*y'g(u)|p >= 0, y 2 0. 
Weak duality holds between (P) and (MWSD) if f satisfies 
1 
(c— u)'VF(u) + (e@— u)'V? f(u)p > 0 => f(x) > flu) — sp'V? f(u)p 
(called second order pseudo-convex) and y’g satisfies 


y'g(a) — y’g(u) + SPV? utg(up <0 
= («—u)'Vy'g(u) + (a — u)*' V7 [y'g(u)|p < 0 


(called second order quasi-convex). 
Other second order duals can be found in [13]. 


8.7 Symmetric duality 


In [3], Dantzig, Eisenberg and Cottle formulated the following pair of sym- 
metric dual problems: 


(SP) minimize K(a,y) — y'V2K (a, y) 
—-V2K(a,y) >0 
x>0 
(SD) maximize K(u,v) — u'ViK (u,v) 
-ViK(u,v) <0 


v>0. 
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Weak duality holds if kK is convex in «x for fixed y and concave in y for 
fixed x. 

In [12] Mond and Weir considered the possibility of weakening the convex- 
ity and concavity requirements by modifying the symmetric dual problems. 
They proposed the following: 

(MWSP) minimize K (2, y) 
—V2K (x,y) >0 
—y'V2K (x,y) <0 
xr>0 
(MWSD) maximize K (u,v) 
—-ViK(u,v) <0 
—u'ViK(u,v) >0 
v>0. 


Weak duality holds if K is pseudo-convex in x for fixed y and pseudo-concave 
in y for fixed «x. 


Proof. (x —u)'Vi K (u,v) >0 = > K(a,v) > K(u,v) 

(v—y)'V2K(x,y) <0 => K(z,v) < K(z,y) 

K(2x,y) 2 K(u,v). 

Once symmetric duality was shown to hold with only pseudo-convex and 
pseudo-concave requirements, it was tempting to try to establish a pair of 
symmetric dual fractional problems. Such a pair is given in [2]. 

minimize [o(x, y)/W(x, y)] 

W(a, y)V29(x, y) = o(z, y)VoV(z, y) 0) 
y'[b(2,y)V2d(x,y) — O(a, y)Vov(2,y)]| = 0 
xr>0 
maximize [@(u, v)/(u, v)] 

v(u, v)Vi¢(u, v) _ o(u, v)Viv(u, v) 2 0 
u'[o(u, v)Vi¢(u, v) 7 o(u, v)Viv(u, v)] < 0 
v>0 

Assuming that $(-,y) and w(a,-) are convex while ¢(a,-) and w(-,y) are 
concave, then the objective function is pseudo-convex in « for fixed y and 


pseudo-concave in y for fixed x. In this case weak duality holds, i.e., for 
feasible (x, y) and (u,v) 


o(z,y)/b(2,y) = o(u, v)/P(u, v). 


Finally we point out that Mond—Weir duality has been found to be useful 
and applicable in a great many different contexts. A recent check of Math 
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Reviews showed 112 papers where the term Mond—Weir is used either in the 


title or in the abstract. Seventy-eight of these papers are listed in [10]. 
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Chapter 9 


Computing the fundamental matrix 
of an M/G/1-type Markov chain 


Emma Hunt 


Abstract A treatment is given of a probabilistic approach, Algorithm H, to 
the determination of the fundamental matrix of a block-structured M/G/1- 
type Markov chain. Comparison is made with the cyclic reduction algorithm. 


Key words: Block Markov chain, fundamental matrix, Algorithm H, 
convergence rates, LR Algorithm, CR Algorithm 


9.1 Introduction 


By a partitioned or block-M/G/1 Markov chain we mean a Markov chain 
with transition matrix of block-partitioned form 


Bi BeBe Br tus 
AgcAy Ao Ass. 
PO NG. Ne Aa 24 


where each block is k x k, say. We restrict attention to the case where the 
chain is irreducible but do not suppose positive recurrence. If the states are 
partitioned conformably with the blocks, then the states corresponding to 
block (> 0) are said to make up level ¢ and to constitute the phases of level 
¢. The j-th phase of level ¢ will be denoted (¢, 7). 

In [18] Neuts noted a variety of special cases of the block-M/G/1 Markov 
chain which occur as models in various applications in the literature, such as 
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Bailey’s bulk queue (pp. 66-69) and the Odoom—Lloyd—Ali Khan—Gani dam 
(pp. 69-71 and 348-353). 

For applications, the most basic problem concerning the block-M/G/1 
Markov chain is finding the invariant probability measure in the positive 
recurrent case. We express this measure as 7 = (79, 771,.--), the components 
am; being k-dimensional vectors so that 7 is partitioned conformably with the 
structure of P. An efficient and stable method of determining 7 has been 
devised by Ramaswami [20] based on a matrix version of Burke’s formula. 
The key ingredient here is the fundamental matrix, G. This arises as follows. 

Denote by G,.¢ (1 < r,@ < k) the probability, given the chain begins in 
state (¢ + 1,r), that it subsequently reaches level i > 0 and that it first 
does so by entering the state (7,2). By the homogeneity of the transition 
probabilities in levels one and above, plus the fact that trajectories are skip- 
free downwards in levels, the probability G,.¢ is well defined and independent 
of i. The fundamental matrix G is defined by G = (G;,..). 

A central property of G, of which we shall make repeated use, is that it is 
the smallest nonnegative solution to the matrix equation 


G=A(G):= >> AG, (9.1) 
i=0 
where we define G° = I [9, Sections 2.2, 2.3]. 
Now definéss..0. ie iwiuak ce 
A= SAC BSS Bear, ak (9.2) 
j=i j=i 


The Ramaswami generalization of the Burke formula to block-M/G/1 
Markov chains is as follows (Neuts [18, Theorem 3.2.5]). 


Theorem A. For a positive recurrent block-M/G/1 chain P, the matrix 
I — Aj is invertible and the invariant measure of P satisfies 


i-1 
mT = | 70Bz +) a Bha; (i= At)? (i> 1). 


gat 


The determination of 7 is discussed in Neuts [18, Section 3.3]. The the- 
orem may then be used to derive the vectors 7; once G is known. Thus the 
availability of efficient numerical methods for computing the matrix G is 
crucial for the calculation of the invariant measure. 

Different algorithms for computing the minimal nonnegative solution of 
(9.1) have been proposed and analyzed by several authors. Many of them 
arise from functional iteration techniques based on manipulations of (9.1). 
For instance, in Ramaswami [19] the iteration 
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Xjui = AiX}, Xo =0, (9.3) 
1=0 


was considered. Similar techniques, based on the recurrences 


Xj41 = (I —A1)7* Ao + $2 (= Ai) * AX} (9.4) 


i=2 


or 
BS -1 
i=1 


were introduced in Neuts [18], Latouche [12] in order to speed up the con- 
vergence. However, the convergence of these numerical schemes still remains 
linear. In Latouche [13] a Newton iteration was introduced in order to arrive 
at a quadratic convergence, with an increase in the computational cost. In 
Latouche and Stewart [15] the approximation of G was reduced to solving 
nested finite systems of linear equations associated with the matrix P by 
means of a doubling technique. In this way the solution for the matrix P 
is approximated with the solution of the problem obtained by cutting the 
infinite block matrix P to a suitable finite block size n. 

In this chapter we present a probabilistic algorithm, Algorithm H, for the 
determination of the fundamental matrix G in a structured M/G/1 Markov 
chain. An account of the basic idea is given by the author in [11]. Algorithm 
H is developed in the following three sections. In Section 5 we then consider 
an alternative approach to the calculation of G. This turns out to be rather 
more complicated than Algorithm H in general. We do not directly employ 
this algorithm, Algorithm H, and we do not detail all its steps. 

However Algorithm H serves several purposes. First, we show that it and 
Algorithm H possess an interlacing property. This enables us to use it to 
obtain (in Section 7) information about the convergence rate of Algorithm H. 
Algorithm H reduces to Algorithm LR, the logarithmic-reduction algorithm 
of Latouche and Ramaswami [14], in the quasi birth-death (QBD) case. Thus 
for a QBD the interlacing property holds for Algorithms H and LR. This we 
consider in Section 8. 

In Section 9 we address the relation between Algorithm H and Bini and 
Meini’s cyclic reduction algorithm, Algorithm CR. Algorithm CR was devel- 
oped and refined in a chain of articles that provided a considerable improve- 
ment over earlier work. See in particular [3]-[10]. We show that Algorithm H 
becomes Algorithm CR under the conditions for which the latter has been 
established. Algorithm H is seen to hold under more general conditions than 
Algorithm CR. It follows from our discussion that, despite a statement to the 
contrary in [8], Algorithm CR is different from Algorithm LR in the QBD 
case. 
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9.2 Algorithm H: Preliminaries 


It proves convenient to label the levels of the chain A as —1,0,1,2,..., so 
that A is homogeneous in the one-step transition probabilities out of all 
nonnegative levels. Thus the matrix G gives the probabilities relating to first 
transitions into level —1, given the process starts in level 0. Since we are 
concerned with the process only up to the first transition into level —1, we 
may without loss of generality change the transition probabilities out of level 
—1 and make each phase of level —1 absorbing. That is, we replace our chain 
A with a chain A with levels —1,0,1,2,... and structured one-step transition 
matrix 


Most of our analysis will be in terms of the (substochastic) subchain Ag 
with levels 0,1,2,... and structured one-step transition matrix 


Aj Ag Ab 
«) _ | 40-41 Ae «+. 
B= 1.0: Ag Ag sas 


The assumption that A is irreducible entails that every state in a 
nonnegative-labeled level of A has access to level —1. Hence all the states 
of Ap are transient or ephemeral. 

For t > 0, let X; denote the state of Ap at time ¢ and let the random 
variable Y; represent the level of Ap at time t. For r,s € K := {1,2,...,k} 
we define 


Ur i= P (Uim- (0,8), Yu >0(0<u<t)} Xo = on), 


t>0 


Thus U,,, is the probability that, starting in (0,7), the process Apo revisits 
level 0 at some subsequent time and does so with first entry into state (0, s). 

The matrix U := (U;,,) may be regarded as the one-step transition matrix 
of a Markov chain U on the finite state space K. The chain U/ is a censoring 
of Ag in which the latter is observed only on visits to level zero. No state of 
U is recurrent, for if r € K were recurrent then the state (0,7) in Ao would 
be recurrent, which is a contradiction. Since no state of U is recurrent, J —U 
is invertible and 


se Us=(l-v). 
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The matrix U is also strictly substochastic, that is, at least one row sum is 
strictly less than unity. 

In A, any path whose probability contributes to G,, begins in (0,7), makes 
some number n > 0 of revisits to level 0 with —1 as a taboo level, and then 
takes a final step to (—1,s). Suppose the final step to (—1,s) is taken from 
(0,m). Allowing for all possible choices of m, we derive 


Gys = pa (>: v') (Ag) ’ 


mek \i=0 
so that 
G= (>. v' Ap = (I— U)~1Ap. 
i=0 
Our strategy for finding G is to proceed via the determination of U. 
For ¢ > 0, we write U(£) for the matrix whose entries are given by 


UO). =P (U {X, = (0,8), 0< % <£(0<u<d} Xo = 0.9) 


t>0 


for r,s € K. Thus U(£) corresponds to U when the trajectories in Ap are 
further restricted not to reach level @ or higher before a first return to 
level 0. 

We may argue as above that J — U(¢) is invertible and that 


Further, since U is finite, 
U() TU asl ow 
and 
[(T-U(O)" t (F-—U)“" as £5 ©. 


The probabilistic construction we are about to detail involves the exact 
algorithmic determination (to machine precision) of U(@) for @ of the form 
2" with N a nonnegative integer. This leads to an approximation 


Ty = [I-U(2%)]~* Ao 


for G. We have 
Tn }GasN> ow. 


The matrix Ty may be interpreted as_the contribution to G from those 
trajectories from level 0 to level —1 in A that are restricted to pass through 
only levels below 2%. 
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9.3 Probabilistic construction 


We construct a sequence (A;)j;+09 of censored processes, each of which has 
the nonnegative integers as its levels. For j > 1, the levels 0,1,2,... of A; 
are respectively the levels 0,2,4,... of Aj_1, that is, Aj is Aj;_1 censored to 
be observed in even-labeled levels only. Thus A; is a process that has been 
censored j times. By the homogeneity of one-step transitions out of level 1 
and higher levels, A; has a structured one-step transition matrix 


BY) Es BY) os 
Av? AY AW) ane 


om ge 
PO= | AW AO | 


that is, each on A; is of structured M/G/1 type. We have Be = A; for 


i>1and Al” = A; fori > 0. 

In the previous section we saw that Ag contains no recurrent states, so 
the same he be true also for the censorings A,, Ag, ... . The substochastic 
matrices BY), A, formed by censoring A; to be observed only in levels 0 
and 1 ore ins also contain no recurrent states. Hence J — BY ) and 
I- Ag ) are both invertible. 

We now consider the question of deriving the block entries in PY+) from 
those in PY), First we extend our earlier notation and write xl ) Ag ) respec- 
tively for the state and level - A, at time t € {0,1,...}. For h a nonnegative 


integer, define the event 2) ee by 
oF, = x” = (h,s), y,) = y) is even (0<u< y} 


and for n > 0, define the k x k matrix LYth by 


ow), 


for r,s € K. By the homogeneity in positive-labeled levels of the one-step 
transition probabilities in A;, the left-hand side is well defined, that is, the 
right-hand side is independent of the value of @ > 0. 


Pl) oe) t,2@+2n ay (2¢ + 1, r) 


t>0 


We may interpret (LY ame a as the probability, conditional on initial state 
(2€+ 1,r), that the first transition to an even-labeled level is to state (20 + 
2n, s). 

We may express the transitions in A,,, in terms of those of A; and the ma- 
trices Lg ae by an enumeration of possibilities. Suppose 7 > 0. A single-step 
transition from state (7,r) to state (i-—1+n,s) (n > 0) in Aj41 corresponds 
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to a transition (possibly multistep) from state (2i,r) to state (2(¢-—1+7),s) 
in A; that does not involve passage through any other state with even-labeled 


level. When n > 0, this may occur in a single step with probability (AD bee 
For a multistep transition, we may obtain the probability by conditioning on 
the first state lying in an odd-labeled level. This gives 


APU aA SoA Wee), (9.6) 

For n = 0, there is no single-step transition in A; producing a drop of two 
levels, so the leading term disappears to give 

AGED — AO 1049, (9.7) 


A similar argument gives 


2m-"n—-—mM 


BO+) — BY : as > BY) 16 j+1) (n > 1) (9.8) 


m=1 


for transitions from level 0. 
The determination of the matrices LY 
0, define the k x k matrix KY+) py 


(KY) = PG) OY 5t,2@+1 


t>0 
for r,s € K. Again the left-hand side is well defined. We may interpret this 
as follows. Suppose A; is initially in state (26+ 1,r). The (r,s) entry in 


J+) proceeds in two stages. For n > 


= (20+ 1,r) 


a 


KG +) is the probability that, at some subsequent time point, A; is in state 
(2¢ + 2n + 1,s) without in the meantime having been in any even—labelled 
level. 

Each path in A; contributing to LY *) consists of a sequence of steps each 
of which involves even-sized changes of level, followed by a final step with an 
odd-sized change of level. Conditioning on the final step yields 


oS S- KEV AD y(n > 0). (9.9) 


m=0 


To complete the specification of PG+) in terms of P“), we need to deter- 
mine the matrices K{!*!), We have by definition that 


(),. 


U OS 41) XG? = (241, 


t>0 
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Since A; is skip-free to the left, trajectories contributing to Ke +) cannot 


change level and so 
‘ oo ay i \\—1 
Keay (AP) S(T aP (9.10) 
i=0 
For n > 0, KY *) involves at least one step in A; with an increase in level. 


Conditioning on the last such step yields the recursive relation 


n—-1 
KO = PUD aK GI, OAD 
m=0 


We may also develop a recursion by conditioning on the first such jump 
between levels. This gives the alternative recursive relation 


n-1 
KO Ke An og ROM GST. X01) 
m=0 


Since level 1 in Ay corresponds to level 2% in Ao, paths in Ay from (0,r) 
to (0,s) that stay within level 0 correspond to paths from (0,7) to (0,8) in 
Ap that do not reach level 2% or higher. Hence 


(aM), = 2", 


for r,s € K, or 
BY) = u(29). 


Thus the recursive relations connecting the block entries in PV+") to those 
in P) for 7 =0,1,...,.N—1 provide the means to determine U(2") exactly 
and so approximate G. 


9.4 Algorithm H 


In the last section we considered the sequence of censored processes (Aj) ;.9, 


each with the nonnegative integers as its levels. The determination of BW) re- 


quires only a finite number of the matrix entries in each PY) to be determined. 
For the purpose of calculating Ty, the relevant parts of the construction of 
the previous section may be summarized as follows. 

The algorithm requires initial input of Ag, A,,..., Agv_1. First we specify 


BO =A, (n=1,...,2%), 
AM =A, (n=0,1,...,2% —1). 
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We then determine 


BO), BO .., Bow=i, 


AD AM Aging 4 
recursively for 7 = 1,2,...,N as follows. To obtain the block matrices in 


Aj+1 from those in A,, first find the auxiliary quantities 


Ko = [ray] ; 


Hg = SOA ait, 
for W412 y5.52" 41 = 1 and 
LG+}) = Keay) a 
m=0 


ior =O, big I 


Calculate : he 
AG) = ALE 
and 
BY" = Be, So BLED, 
m=1 
AG) = Aya T Se Ay Ls jeune 
m=0 


for n = 1,2,. Pe 1. 


The above suffices for the evaluation of B; ee ) 


We then compute 
(N)]~* 
Tn = [7 = By Ao, 


which is an approximation to G incorporating all contributing paths in A 
that do not involve level 2% (or higher). The algorithm may be specified as 
a short MATLAB program. 


9.5 Algorithm H: Preliminaries 


We now consider an Algorithm H oriented toward a different way of calculat- 
ing G. This involves two sequences (M,)j>1, (Nj)j>1 of censored processes, 
all with levels —1,0,1,2,.... The process M, has block-structured one-step 
transition matrix 
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The process N; has block-structured one-step transition matrix 


P01 <0 
(9) ali) BW) 

aii Be 00) Bee iBa As 

wl Ge Be Be ane 


These are set up recursively, beginning with M, = A, that is, 


A de GS 0): (9.13) 


We construct Vj by censoring M,, observing it only when it is either in level 
—1 or a change of level occurs. Thus for i ¥ 1, 
By = [1 -A)| Fe. (9.14) 
We form Mj+1 by censoring N;, observing only the odd-labeled levels —1, 
1, 3, 5, ...., and then relabeling these as —1, 0, 1, 2, ... . Thus level 
€ > —-1 of Mj41 and Nj+1 corresponds to level 2(¢+1)—1 of M,; and N;. It 
follows that level (> —1) of M, and N; corresponds to level (¢+ 1)27~'—1 
of M, and Nj. 
We derive the blocks of P'’*” from those of Q” as follows. Let X” 
eo 


x denote respectively the state and level of Nj at time t. Following the 
procedure involved in Algorithm H, define for h a nonnegative integer 


oe = {x,” = (ys); yo - ar is even (0<u< } 


) 


The matrices io bles are then defined for n > 0 by 


Bl) 
U 25 ¢00+2n-1 


(a) ne 
me t>0 


Xe? = (28, | 


for r,s € K. As before the right-hand side is independent of ¢ > 0. The matrix 
1 ad plays a similar role to that of LY +) for Algorithm H, only here the 


trajectories utilize only even-labeled levels of Nj except for a final step to an 
odd-labeled level. 
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Arguing as before, we derive 


Be Be oe ss Bor. Grea) (9.15) 
m=0 
with 
Ave => BT GpG. 1), (9.16) 


The derivation is identical to that leading to (9.7) and (9.6). For n = 1 there is 


no term corresponding to the first term on the right in (9.6) since the present 


censoring requires Be = 0. The present censoring out of even-labeled, as 


opposed to odd-labeled, levels means that no analogue to (9.8) is needed. 


As before we now determine the matrices be Pn ie terms of matrices 
Kee We define 
Ze+) e) 
(Ky ) = Py) a ,t,2l+2n Xo = (20, n| 
- t>0 


for r,s € K and n> 1. As before the pare ee side is independent of ¢ > 0. 
By analogy with (9.10), we have since Be = 0 that 


_—(y-l 
RS oR? | St 
By analogy with (9.9), we have 


bee RID Bie 
- oF fn) 
=5, 4 s Ke By, sO), (9.17) 


For n = 0 we adopt the convention of an empty sum being interpreted as 
zero. 
Finally we have corresponding to (9.11) that 


n-1 : ‘ 
Roe = Se ee Be as 


m=0 


a. = Be + a Ke Be m)+1 (n 2 1), (9.18) 


1 
where again the empty sum for n = 1 is interpreted as zero. As with i : 


the leading term on the right-hand side corresponds to a single-step ieaneibion 
in N; while the sum incorporates paths involving more than one step in N5. 
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As with the computations involved in Algorithm H, By can be calculated 
in a finite number of steps. We may identify the relevant steps as follows. 

We require initial input of Ap, Aj, ... , Agv_ 1. First we specify AS =A, 
for n = 0,1,...,2% — 1. We calculate 


a2, a, AD 4 


(FBG) a) BU) 


Bes Be Bz ,...,Bgn-j_1 
recursively for 7 = 2,...,.N as follows. 
We have 
eee ee Fa ie ree ; 
BS [7 - Ay” Ae M20 a ON A, (9.19) 


To obtain the matrices 
—(j4+1) —(j41 a 
7 Osa Chae reer Ceara 


first evaluate the auxiliary quantities 


Roe Bet = Ro Ae a (9.20) 
for ni =1,.2,0..,2% 9-1 =1 and 
Dnt? = Bin + D> Km Batam) (9.21) 
m=1 
n=0,1,...,2%-J-! —1 and then calculate 
AS Bt Gea), (9.22) 


AP AB AS BOL Gs tiaage St). (1623) 


We shall make use of this construction in subsequent sections. 


9.6 H, G and convergence rates 


For j > 1, define mM” by 


(M7) = P| (J 69)| Xo = (0,7) (9.24) 


t>0 
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for r,s € K, where 
oY) — {X, = (21,8), 0< Yy <2 -1, Yu 421-100 <u<d)}. 


We note that this gives 


M® =1. (9.25) 


Also for 7 > 1 we put 


P| {% =(-1,s) ) trey <2 1} = O.n] 
t>0 


for r,s € K. Here Y = maxo<uct Y,. We may interpret V; as the contribution 
to G from those trajectories which reach a level of at least 2/~' — 1 but 
achieve a maximum level less than 2/ — 1. By decomposing G according to 
the maximum level reached by a trajectory contributing to it, we can thus 
derive 


G= s Vj. (9.26) 


=1 


a 


For j = 1, we have also that 


(Vi) ps = P U {Xt =(-1,s), Yy=0(0<u< t)} | Xo = 9) 


so that B® 
VY, = F (9.27) 


Proposition 9.6.1 For j > 1, the matrices V;, mM” are related by 

Vem ae (9.28) 
Proof. By (9.25) and (9.27), the result is immediate for 7 = 1, so suppose 
j > 1. Since A is skip-free from above in levels, every trajectory contributing 


to V; must at some time pass through level 2/~' — 1. Conditioning on the 
first entry into level 27—! — 1, we have 


meK t>0 
_ Sy (um) P [xy (-1 s)|Xo = (0,m)| 
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where 


wy) = {X= (-1,s), 0<¥u<%-100<u<d}, 


giving the required result. 


Once a convenient recursion is set up for the determination of the matrix 
mM”, this may be used to set up Algorithm H for approximating G by use 
of (9.28). Iteration N of Algorithm H gives the estimate 


Tes YoY; (9.29) 


for G. 

The contribution Ty is the contribution to G (describing transitions from 
level 0 to level —1) by paths which which reach a level of at most 2N+1—2. The 
estimate Ty from the first N iterations of Algorithm H is the contribution 
from paths which reach a level of 2’+! — 1 at most. Hence we have the 
interlacing property 


T. <7, < Te < To <T3 <... 


connecting the successive approximations of G in Algorithms H and H. We 
have Ty | Gand Ty }Gas No. 
The interlacing property yields 


I(G—Ti)el] > (G — Ty)el| = |(G — T2)el| = II(G — Ta)ell 2... 


or _ = 
lle — Tiel] > lle— Tiel] 2 lle — T2ell > lle — Tell >... 
in the case of stochastic G. 
The interlacing property need not carry over to other error measures such 
as goodness of fit to the equation G = A(G). This will be further discussed 
in the subsequent partner chapter in this volume. 


Theorem 9.6.1 If A is transient and irreducible, then Tn converges to G 
quadratically as N — oo. 


Proof. If A is transient and irreducible, then the maximal eigenvalue of G 
is numerically less than unity. We may choose a matrix norm such that 0 < 
||G|| < € < 1. We have 
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Choose kK > 1 to be an upper bound for the norm of all substochastic 
k x k matrices. (For some norms K equals 1 will suffice.) Then by (9.28) 


Vill < KIBO] < Ke] < KEP, 


Hence 
N foe) fore) = 
a-YouJs] ow] s do wee 
j=l j+N+1 j=N+4+1 
= KE2") se g(2) 
£=0 


< Ke") /(1-@), 
whence the stated result. 


Corollary 9.6.1 By the convergence result for Ty and the interlacing prop- 
erty, Algorithm H also converges to G quadratically when A is transient and 
irreducible. 

Remark 1. Similarly to the argument for Be. we have that 


Gi), =P [Yan 


t>0 


Xo=(2?7*- mi) ; (9.30) 


where 
AD ah = OF ys) 0 < Vi 29-0F 4 S10 Seay) 


We shall make use of (9.30) in the next section. 


By the interlacing property, an implementation of Algorithm H would, 
for the same number of iterations, be no more accurate than Algorithm H. 


The computation of the auxiliary matrices mM” appears to be in general 
quite complicated, so Algorithm H offers no special advantages. However it 
does have theoretical interest, as with the interlacing property shown above 
and the consequent information about the convergence rate of Algorithm 
H. We shall see in the next section that in the special case of a QBD, Al- 
gorithm H reduces to the logarithmic-reduction algorithm of Latouche and 
Ramaswami [14]. 


9.7 A special case: The QBD 


There is considerable simplification to Algorithm H in the special case of a 
QBD, arising from the fact that this is skip-free in levels both from above 
and from below. We now investigate this situation. 
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Because A = 0 for n > 2, (9.20) gives irecuae ) — 0 for n > 0. We have 


already seen that Keo = I. Relation (9.21) now provides 


Ty) = By, Tyt? = By? and T*” =0 forn>1. 


< 


Equations (9.22) and (9.23) consequently yield 


—(4 —(7)\ 2 
Ae (a for n = 0,2, (9.31) 
AOS By By eB Be. (9.32) 
The relations (9.19) give 
B= [r- A ale ‘AD for n= 0,2. (9.33) 


Equations (9.31)—(9.33) are simply the familiar defining relations for Al- 


gorithm LR. We now turn our attention to the matrix mM” 
For a QBD with 7 > 1, 


(Mm?) =P|U x8) X= 9) 
. t>0 
where 
at = QS 12), 0S Vy <o? 1, Oars) } 
and so 


(me) = 


P| ae 


t>0 


Xo = on]. 


Thus in particular 


U {X= (8) Fo Oy OR w< ot Xo = 09 
t>0 


= P[X1=(1,s)|Xo=(0,7)] 


that 
in 2) _ a) 


MM =B;’. (9.34) 


For 7 > 2, we derive by conditioning on the first passage of Ny to level 
23-1 — 1 that 
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(a°*”) 


=.) P 


mek 


- > (#) 0) 


mek 
= (7 By’) 


7) 


8,t,v 
v>0 


(0,r)| x P 


U ee BG 


t>0 


(29- = 21-15) 


r,s 


where 


1 = {Xi = (2-158), OS Vy <M -1(t<u<t+o)}. 


s,t,v 


Thus 
Mit) _ 7B me 
and so for 7 > 2 
met) — Mo By ) aBe 
2B By 4: BO, 
Taking this result with (9.34) yields 
mM? =B® Bo tor jf $2. 


Finally, Proposition 9.6.1 provides 


V.= By” for j=1 
SB EB Be tong Sd, 


With this evaluation for V;, (9.29) is the formula employed in Algorithm LR 
for calculating approximations to G. Thus Algorithm H reduces to Algorithm 
LR in the case of a QBD. 

In the case of a QBD, some simplifications occur also in Algorithm H. 
Since AY = 0 for n > 2, we have KY) = 0 for n > 0 and so 


pot) = KGtD AO) 


EOF) = O44, 


LYt+) = 0 forn>1. 


Also BY) = 0 for n > 2. 
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The relations linking Aj; and A; are thus 


Aur” = AY? 4+ AD KOT 4Q) + AD KOTD AQ), 
BO) & BY + BO KI 40), 
BOTY = BO KGD AV. 
The initialization is 
AM =A; (6=0,1,2), BO=A; (6=1,2). 


As a result, Algorithm H can, in the QBD case, be run in a very similar 
way to Algorithm LR. The censorings and algebraic detail are, however, quite 
different. 

We programmed the LR Algorithm and ran it and Algorithm H on an 
example given in [7] and subsequently in [1] and [2]. 


Example 5. Latouche and Ramaswami’s pure-birth/pure-death process. 
This example is a QBD with 


_ |l—pod _ | Op _ {0 0 
Ay = | 0 a Tear ea ece 


We chose p equal to 0.1. 

In presenting results we employ G; as a generic notation for the approxi- 
mation to G after J iterations with the algorithms involved, viz., Ty; and Ty 
in the present case. 

The results in Table 9.1 have errors 


0.5672 > 0.4800 > 0.3025 > 0.2619 > --- > 4.9960e — 14 > 4.3854e — 14, 
illustrating well the interlacing property. 


Table 9.1 The interlacing property 


LR H 
Iteration lle-— Grelloo CPU lle— Grelloo CPU 
I Time (s) Time (s) 
1 0.5672 0.001 0.4800 0.000 
2 0.3025 0.002 0.2619 0.000 
3 0.1027 0.003 0.0905 0.004 
4 0.0145 0.007 0.0130 0.007 
5 3.3283e-04 0.009 2.9585e-04 0.009 
6 1.7715e-07 0.011 1.574 7e-07 0.010 
7 4.9960e-14 0.012 4.3854e-14 0.010 
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9.8 Algorithms CR and H 


We now consider the relation between Algorithm H and Bini and Meini’s 
Cyclic Reduction Algorithm CR. The latter is carried out in terms of formal 
power series, so to make a connection we need to express Algorithm H in 
these terms, too. For 7 > 0, we define 


pA (z => AG 


$9 (2) = > Bz 


n=0 


We remark that since A; is substochastic, these series are absolutely conver- 
gent for |z| < 1. We encapsulate the odd- and even-labeled coefficients in the 
further generating functions 


“24 Bam, WC) = Awl” 


n=0 
2(n+1) ’ O° , 2n+1 . 
n=0 n=0 


Again, these power series are all absolutely convergent for |z| < 1. 
We introduce 


LG+1)(z) — » EG+) ye 
n=0 


KG+*1)(z) — ‘> KG+V zn, 


Multiplication of (9.6) by z”, summing over n > 1 and adding (9.7) provides 


eG) = Yo abl ahs (So all) Jo 2g 


m=0 n=0 
= 2p) (z) +o (Z) LO (2). (9.35) 
Similarly (9.8) gives 
OUD (2) = YT Bay a2 + (= Bonz "> » Lye 
n=1 m=0 


= $9) (z) + 29) (z) LIT (2). (9.36) 
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Forming generating functions from (9.9) in the same way leads to 
LUD (2) = KEY (zy O(z), (9.37) 
while from (9.10) and (9.11) we derive 


foe) n—1 


KONG) SKS + De Re Aneaiaho 
n=1 m=0 
j+1 j+ n j j+1 

= Ky 4 > Ae ” > ss AD seo 
m=0 n=m+1 

= Ke + ye KG+1) zm ys ZA Re 
m=0 al 

= KUtD Pe KO+)(z) Lbs £AD KEY 

(=1 
= KE*? + KH (Z) [y - AP] KEP 
= KEY + KON (2) |W (z) — I] KP 


4 KG+D(z) [7 = Ay| KEY, 
By (9.10) the last term on the right simplifies to K+) (z). Hence we have 
KY*Y = K+ (2) [7 - yP)] KE. 
Postmultiplication by I — AY r= [KY Pet yields 
T= KO+N(2) [T- y(a)], 
so that 
K+) (z) = [7 a H(2)] = ' 
Hence we have from (9.37) that 
LU*D(2) = [F-v(2)] WW. 


We now substitute for LY+!)(z) in (9.35) and (9.36) to obtain 


BOY (2) = (2) +262) [IW] YH (0.28) 
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and 
bItD(z) = 2b D(z) + 6P(2) [7 = H(2)] "pa. (9.39) 
We have also that 
py) (z) = ys Anz” (9.40) 
n=0 
and oo 
gb) (z) = x An412". (9.41) 
n=0 


The recursive relations (9.38)—(9.41) are precisely the generating functions 
used in CR (see [5]). Thus Algorithm H is equivalent to the cyclic reduction 
procedure whenever the latter is applicable. 

The formulation of Algorithm CR of Bini and Meini that we derived above 
is the simpler of two versions given in [5]. Bini and Meini have developed the 
theme of [5] in this and a number of associated works (see, for example, 
[3}-[10)). 

The proofs in [5] are more complicated than those we use to establish 
Algorithm H. Furthermore, their treatment employs a number of assumptions 
that we have not found necessary. The most notable of these are restrictions 
as to the M/G/1-type Markov chains to which the results apply. Like us, 
they assume A is irreducible. However they require also that A be positive 
recurrent and that the matrix G be irreducible and aperiodic. 

Further conditions imposed later in the proofs are less straightforward. 
Several alternative possibilities are proposed which can be used to lead to 
desired results. These are: 


So Sa 
(a) that the matrix (1 ae y is bounded above; 


(b) that the limit P’ =lim;... P exists and the matrix P’ is is the one- 
step transition matrix of a positive recurrent Markov chain; 

(c) that the matrix Au ) ig irreducible for some 43 

(d) that the matrices )77°, Ag ) are irreducible for every j and do not con- 
verge to a reducible matrix. 
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Chapter 10 


A comparison of probabilistic 
and invariant subspace methods 


for the block M/G/1 Markov chain 


Emma Hunt 


Abstract A suite of numerical experiments is used to compare Algorithm H 
and other probability-based algorithms with invariant subspace methods for 
determining the fundamental matrix of an M/G/1-type Markov chain. 


Key words: Block M/G/1 Markov chain, fundamental matrix, invariant 
subspace methods, probabilistic algorithms, Algorithm H 


10.1 Introduction 


In a preceding chapter in this volume, we discussed the structure of a new 
probabilistic Algorithm H for the determination of the fundamental matrix 
G of a block M/G/1 Markov chain. We assume familiarity with the ideas 
and notation of that chapter. In the current chapter we take a numerical 
standpoint and compare Algorithm H with other, earlier, probability-based 
algorithms and with an invariant subspace approach. 

The last-mentioned was proposed recently by Akar and Sohraby [4] for 
determining the fundamental matrix G of an M/G/1-type Markov chain or 
the rate matrix R of a GI/M/1-type Markov chain. Their approach applies 
only for special subclasses of chains. For the M/G/1 case this is where 


A(z) = ss Ajz* 
i=0 


is irreducible for 0 < z < 1 and is a rational function of z. The analysis 
can then be conducted in terms of solving a matrix polynomial equation 
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of finite degree. Following its originators, we shall refer to this technique 
as TELPACK. It is important to note that TELPACK applies only in the 
positive recurrent case. 

It is natural to have high hopes for such an approach, since it exploits spe- 
cial structure and circumvents the necessity for truncations being made to 
the sequence (Ax ),>0. The solution to the polynomial problem is effected via 
a so-called invariant subspace approach. The invariant subspace approach 
is one that has been extensively used over the past 20 years for attacking 
an important problem in control theory, that of solving the algebraic Ric- 
cati equation. This has been the object of intense study and many solution 
variants and refinements exist. 

A further treatment relating to the M/G/1-type chain has been given 
by Gail, Hantler and Taylor [6]. Akar, Oguz and Sohraby [3] also treat the 
finite quasi-birth-and-death process by employing either Schur decomposition 
or matrix-sign-function iteration to find bases for left- and right-invariant 
subspaces. 

In connection with a demonstration of the strength of the invariant sub- 
space method, Akar, Oguz and Sohraby [1], [2] made available a suite of 
examples of structured M/G/1 and GI/M/1 Markov chains which may be 
regarded as standard benchmarks. These formed part of a downloadable pack- 
age, including C code implementations of the invariant subspace approach, 
which was until very recently available from Khosrow Sohraby’s home page at 
http://www.cstp.umkc.edu/org/tn/telpack/home-html. This site no longer 
exists. 

Section 2 addresses some issues for error measures used for stopping rules 
for iterative algorithms, in preparation for numerical experiments. In Sec- 
tion 3 we perform numerical experiments, drawing on a suite of TELPACK 
M/G/1 examples and a benchmark problem of Daigle and Lucantoni. Our 
experiments illustrate a variety of points and provide some surprises. We 
could find no examples in the literature for which A(z) is not rational, so 
have supplied an original example. 


10.2 Error measures 


In [8], Meini noted that, in the absence of an analysis of numerical stability, 
the common error measure 


lle — Grelloo (10.1) 


applied when G is stochastic may not be appropriate for TELPACK. She 
proposed instead the measure 


|Gr — A(Gr) loo, (10.2) 


which is also appropriate in the case of substochastic G, 
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We now note that a closer approximation to G can on occasion give rise 
to a worse error as measured by (10.2). That is, it can happen that there are 
substochastic matrices Go, G1 simultaneously satisfying 


|G — Gilloo < [|G — Golloo, (10.3) 
and in fact 0 << Go < G, < G, but with 
|G1 — A(G1)|loo > ||Go — A(Go)]loo. (10.4) 


We shall make use of the QBD given by 


_ |1l—pod _ | 0p _|0 O 
Ay = | 0 aE ape eee (10.5) 
with 
r>1 and 0<p<lI/r. (10.6) 


This is an extension of the pure-birth/pure-death process of Latouche and 
Ramaswami [7]. With these parameter choices, the QBD is irreducible. It is 
null recurrent for r = 1 and positive recurrent for r > 1, with fundamental 


matrix 
10 
Gs E | | 
Also for any matrix 
x0 F 
Gr= isa with 0<a,y <1, (10.7) 


we have 


= l—p+py 0 
AGa) eae (1 — rp)xy | ; 


Take r = 1 and p= 1/2 and put 


0.50 0.6.0 
Go=[psol: a=|teol- 


We have 
|G — Gill = 0.4 < 0.5 = ||G — Golloc 


and so (10.3) holds. Also 


0.75 0 
A(Go) = Ee i , sothat  ||Gyo — A(Go)lloo = 0.25, 
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and 
A(G1) = ie tl , sothat ||Gy — A(G1)||.o = 0.35. 


We thus have (10.4) as desired. 
Further 
|G — Grloo = lle — Grelloo 


for G; of the form (10.7), so that we have also an example in which (10.4) 
and 
le — Gielloc < lle — Goellac (10.8) 


hold simultaneously. 

The inequalities (10.3) and (10.4) or (10.4) and (10.8) also occur simulta- 
neously for the same choices of Go and G; when we take the QBD given by 
(10.5) with r = 2 and p= 0.4. 

In the above examples, the two nonzero entries in G; — A(G;) (4 = 0,1), 
that is, those in the leading column, have opposite sign. This is a particular 
instance of a general phenomenon with QBDs given by (10.5) and (10.6) and 
G_ of the form (10.7). 

The general result referred to is as follows. Suppose G is of the form (10.7) 
with y <1,a2<1land «a < (1—p)/(1—rp). We have 


(1—p)(l—y) > «(1 — rp) — y) 


or 
= |b p py = pel = pp)ay) 9; 
so that 
(On <i —Or2, 
where 


O; = [Gr = A(Gr)|i4 (@ = 1,2). 


If Og > 0, then O, < 0. Conversely, if O, > 0, then Og < 0. In particular, 
if O, and O2 are both nonzero, then they are of opposite sign. This sort of 
behavior does not appear to have been reported previously. 

It is also worthy of note that, because of its use of rational functions, no 


truncations need be involved in the computation of the error measure on the 
left in (10.2) with the use of TELPACK. 


10.3 Numerical experiments 


We now consider some numerical experiments testing Algorithm H. As previ- 
ously noted, all outputs designated as TELPACK have been obtained running 
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the C program downloaded from Khosrow Sohraby’s website. All other code 
has been implemented by us in MATLAB. 
The following experiments illustrate a variety of issues. 


10.3.1 Experiment G1 


Our first experiment is drawn from the suite of TELPACK M/G'/1 examples. 
We use 


T1+1 TT Ti+ 
Ags) (7) Gr). Gaye) ae) Gp) 


(a) 3)" Gay) 2)" (4B) 


for 1 > 0. This gives 


A(z) = 18 (1 tz) ~1 A(1 Woz) ~1 1 (1 nue re 


70 7 70 7 70 7 
z\-1 1 2 7 
(La7) 0 0 ii i ii 
= _ 10z)-1 ae eee 
a 0 (1 a) 0 21 21 21 
4z\—-1 9 9 12 
0 0 (1 — =) 70 70 70 


TO tas ae 
= [1 ding (F7)| 21 21 21 


This example provides the simplest form of rational A(z) for which every 
A; has nonzero elements and can be expected to favor TELPACK. 
Using the stopping criterion 


(Gr — Gr-a)ell <e (10.9) 


with « = 10~§, TELPACK converges in 7 iterations, Algorithm H in 5. In 
Table 10.1 we include also, for comparison, details of the performance of 
Algorithm H for 6 iterations. 
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Table 10.1 Experiment G1 


TELPACK H 
I |le—Gyello CPU Time(s) I |le—Gyel|l,. CPU Time (s) 


7  9.9920e-16 0.070 5 1.0617e-08 0.001 
6 5.5511e-16 0.001 


The use of the stopping criterion (10.9) is fixed in TELPACK. We have 
therefore employed it wherever TELPACK is one of the algorithms being 
compared. 


10.3.2 Experiment G2 


We now consider a further numerical experiment reported by Akar and 
Sohraby in [4]. The example we take up has 


ANA 0.0002 0.9998] | 1 0 
~ | 0.9800 0.0200] |0 #E* 
The parameter a is varied to obtain various values of the traffic intensity p, 
with the latter defined by 
p= wA'(l)e, 


where x is the invariant probability measure of the stochastic matrix A(1). 
See Neuts [9, Theorem 2.3.1 and Equation (3.1.1)]. 

In Tables 10.2 and 10.3 we compare the numbers of iterations required 
to calculate the fundamental matrix G to a precision of 10~'? or better 
(using (10.9) as a stopping criterion) with several algorithms occurring in 
the literature. The Neuts Algorithm is drawn from [9] and is based on the 
iterative relations G(0) = 0 with 


Gj +1) =(I1-.A,)7* for j > 0. (10.10) 


Ao + DAG) 


The Ramaswami Algorithm [10] is based similarly on G(0) = 0 with 


au+n= (1-oacauy) Ao for j >0. (10.11) 


As noted previously, TELPACK is designed for sequences (A;) for which the 
generating function A(z) is, for |z| < 1, a rational function of z. In this event 
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Table 10.2 Experiment G2 


p 


0.20 


0.40 


0.60 


0.80 


the fundamental matrix satisfies a reduced matrix polynomial equation 


Method 


Neuts 

Ramaswami 

Extended Neuts 
Extended Ramaswami 
TELPACK 

H 


Neuts 

Ramaswami 

Extended Neuts 
Extended Ramaswami 
TELPACK 

H 


Neuts 

Ramaswami 

Extended Neuts 
Extended Ramaswami 
TELPACK 

H 


Neuts 

Ramaswami 

Extended Neuts 
Extended Ramaswami 
TELPACK 

H 


ae 


lle -— Gre| 


3.8658¢e- 
1.6036e-1 
1.11438¢e- 
1.1696e- 
6.6613e-1 
2.2204e-1 


1.1224e-1 
3.7181e- 
3.0221e-1 
1.9054e-1 
2.2204e- 
1.1102e- 


2.3257e-1 
9.0550e-1 
5.7625e- 
4.1225e-1 
4.4409e-1 
2.2204e-1 


BR 


5.4587e- 
4.2666e-1 
1.4281e-1 
1.0090e-1 
4.4409e-1 
4.4409e-1 


oo 


DAnNNnNnNW 


2 
2 


1 
1 
6 
6 


f 
F(G) = )_ RG =0. 
1=0 


CPU 
Time (s) 


0.070 
0.060 
0.020 
0.020 
0.060 
0.001 


0.200 
0.180 
0.040 
0.030 
0.060 
0.001 


0.450 
0.330 
0.060 
0.040 
0.060 
0.001 


1.230 
0.870 
0.150 
0.090 
0.060 
0.001 
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(10.12) 


The Extended Neuts and Extended Ramaswami Algorithms are respectively 
extensions of the Neuts and Ramaswami Algorithms based on Equation 
(10.12). This enables the infinite sums to be replaced by finite ones. Assuming 
the invertibility of F, the recursions (10.10), (10.11) become respectively 


and 


f 
G(0)=0, GU+1)=-F,! Fo +) FG)" 


196 E. Hunt 


Table 10.3 Experiment G2 continued 


p Method I lle — Grelloo CPU 
Time (s) 

0.90 Neuts 163 1.0630e-11 2.710 
Ramaswami 111 7.2653e-12 1.900 
Extended Neuts 451 3.2315e-11 0.310 
Extended Ramaswami 314 2.1745e-11 0.180 
TELPACK 6 6.6613e-16 0.060 

H 8 1.1102¢e-16 0.001 

0.95 Neuts 315 2.2072e-11 5.430 
Ramaswami 214 1.5689e-11 3.810 
Extended Neuts 881 6.6349e-11 0.590 
Extended Ramaswami 606 4.4260e-11 0.350 
TELPACK 7 0 0.060 

H 9 1.1102e-15 0.001 

0.99 Neuts 1368 1.1456e-10 24.770 
Ramaswami 933 7.7970e-11 17.440 
Extended Neuts 3836 3.3809e-10 2.880 
Extended Ramaswami 2618 2.2548e-10 1.720 
TELPACK 8 1.9984e-15 0.060 

H 11 1.0880e-14 0.003 


TELPACK is designed to exploit situations in which A(z) is a rational 
function of z, so it is hardly surprising that it achieves a prescribed accu- 
racy in markedly fewer iterations than needed for the Neuts and Ramaswami 
Algorithms. What is remarkable is that these benefits do not occur for the 
Extended Neuts or Ramaswami Algorithms, in fact they require almost three 
times as many iterations for a prescribed accuracy, although overall the ex- 
tended algorithms take substantially less CPU time than do their counter- 
parts despite the extra iterations needed. 

In contrast to the Extended Neuts and Ramaswami Algorithms and TEL- 
PACK, Algorithm H holds for all Markov chains of block-M/G/1 type. In 
view of this Algorithm H compares surprisingly well. We note that it achieves 
accuracy comparable with that of TELPACK, with much smaller CPU times, 
for all levels of traffic intensity. 


10.3.3 The Daigle and Lucantoni teletraffic problem 


The most common choice of benchmark problem in the literature and the sub- 
ject of our next three numerical experiments is a continuous-time teletraffic 
example of Daigle and Lucantoni [5]. This involves matrices expressed in 
terms of parameters K, pg, a, r and M. The defining matrices A; (i = 0,1, 2) 
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are of size (K + 1) x (K +1). The matrices Ap and A» are diagonal and 
prescribed by 


(Ao)g = 1921 —3/(K 4D) (OS FS K), 4g = 192pal. 
The matrix A, is tridiagonal with 


M-j 


(A1)jj41 = ar O< fa Kay (Apgar ayr (legs) 


A physical interpretation of the problem (given in [5]) is as follows. 

A communication line handles both circuit-switched telephone calls and 
packet-switched data. There are a finite number M of telephone subscribers, 
each of whom has exponentially distributed on-hook and off-hook times, 
the latter having parameter r and the former being dependent upon the 
offered load a which is given in Erlangs. In particular, the rate for the on- 
hook distribution is given by the quantity a/(Mr). Data packets arrive ac- 
cording to a Poisson process and their lengths are assumed to be approx- 
imated well by an exponentially distributed random variable having mean 
8000. 

The communication line has a transmission capacity of 1.544 megabits per 
second of which 8000 bits per second are used for synchronization. Thus, at 
full line capacity, the line can transmit 192 packets per second. Each active 
telephone call consumes 64 kilobits per second. A maximum of min(M,23) 
active telephone subscribers are allowed to have calls in progress at any given 
time. The transmission capacity not used in servicing telephone calls is used 
to transmit data packets. Thus, if there are z active callers, then the service 
rate for the packets is (1 — i/24) x 192. The offered load for the voice traffic 
is fixed at 18.244 Erlangs. 

Following the original numerical experiments in [5], the above example has 
been used as a testbench by a number of authors including Latouche and Ra- 
maswami [7] and Akar, Oguz and Sohraby [1], [2]. This example is a fairly de- 
manding comparative test for an algorithm designed for general M/G/1-type 
Markov chains, since it features a QBD. The logarithmic-reduction method in 
[7] is expressly designed for such processes. The matrix-sign-function method 
of [1] and [2] is designed for the more general, but still rather limited case, 
in which A(z) is a rational function of z. 

As our algorithm is designed for discrete-time rather than continuous-time 
processes, we use the embedded jump chain of the latter, for which the entries 
in G have to be the same, for our analysis. 

In Experiments G3 and G4 we employ the criterion (10.9) with e = 10~'” 
used in the above stochastic-case references, together with the parameter 
choice K = 23, the latter indicating that the matrices A, in the example are 
of size of size 24 x 24. 
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10.3.3.1 Experiment G3 


In this experiment the call holding rate is set at r = 1007! s~! and the calling 
population size M is fixed at 512. In Tables 10.4 and 10.5 we compare the 
number of iterations involved in estimating G with different algorithms for a 
range of traffic parameter values from pg = 0.01 to pg = 0.29568. The latter 
value was noted by Daigle and Lucantoni [5] to correspond to an instability 
limit. The algorithms considered are the logarithmic-reduction algorithm of 
Latouche and Ramaswami (LR), TELPACK and Algorithm H. We do not 
give iteration counts for the original experiments of Daigle and Lucantoni. 
These counts are not detailed in [5] but are mentioned as running to tens of 
thousands. 


Table 10.4 Iterations required with various traffic levels: Experiment G3 


Pa Method I \|Gr — A(Gr)|loo CPU Time (s) 
0.010 TELPACK 10 1.5613e-16 0.450 
LR 4 1.4398e-16 0.010 
H 4 2.4460e-16 0.010 
0.025 TELPACK 10 1.8388¢-16 0.480 
LR 5 1.6306e-16 0.020 
H 5 2.1164e-16 0.020 
0.050 TELPACK 10 2.6368¢e-16 0.460 
LR 8 1.5959e-16 0.040 
H 8 1.5959e-16 0.040 
0.075 TELPACK 10 2.2204e-16 0.450 
LR 0 2.2204e-16 0.040 
H 10 2.6368¢e-16 0.040 
0.100 TELPACK 10 1.5266e-16 0.048 
LR 11 3.4781e-16 0.040 
H 11 3.4478e-16 0.040 
0.120 TELPACK 10 2.0817e-16 0.060 
LR 12 1.5266e-16 0.060 
H 12 2.3592e-16 0.060 
0.140 TELPACK 0 3.6082¢e-16 0.560 
LR 13 2.6368¢e-16 0.060 
H 13 1.5266e-16 0.060 
0.160 TELPACK 9 2.2204e-16 0.420 
LR 14 1.6653e-16 0.060 


H 14 1.9429e-16 0.060 
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Table 10.5 Iterations required with various traffic levels: Experiment G3 continued 


Pa Method I \|Gr — A(Gr)|loo CPU Time (s) 

0.180 TELPACK 9 4.9960e-16 0.470 
LR 14 1.6653e-16 0.060 

H 14 1.9429e-16 0.060 

0.200 TELPACK 9 1.8822¢e-16 0.420 
LR 15 1.1102e-16 0.070 

H 15 1.9429e-16 0.070 

0.220 TELPACK 9 3.0531e-16 0.410 
LR 15 3.6082e-16 0.070 

H 15 2.2204e-16 0.070 

0.240 TELPACK 10 3.053 1e-16 0.450 
LR 16 1.3878e-16 0.080 

H 16 1.1796e-16 0.070 

0.260 TELPACK 10 3.7383e-16 0.470 
LR 17 2.4980e-16 0.080 

H alg 2.2204e-16 0.080 

0.280 TELPACK 12 9.5659e-16 0.530 
LR 18 1.9429e-16 0.080 

H 18 1.1102e-16 0.080 

0.290 TELPACK 13 7.5033e-15 0.560 
LR 20 2.2204e-16 0.080 

H 20 1.3878¢e-16 0.080 

0.29568 TELPACK 20 1.5737e-09 0.830 
LR 29 2.2204e-16 0.100 

H 29 1.6653e-16 0.100 


It should be noted that in the references cited there is some slight variation 
between authors as to the number of iterations required with a given method, 
with larger differences at the instability limit. Akar et al. attribute this to 
differences in the computing platforms used [4]. All computational results 
given here are those obtained by us, either using our own MATLAB code or 
by running TELPACK. 


10.3.3.2 Experiment G4 


Our fourth numerical experiment fixed the offered data traffic at 15%, the 
call holding rate at r = 300-1 s~! and then considered system behavior as a 
function of the calling population size M (see Tables 10.6 and 10.7). 
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Table 10.6 Experiment G4 


M Method I \|Gr — A(Gr)|lo0 CPU Time (s) 
64 TELPACK 9 3.7323¢e-16 0.440 
LR 16 2.7756¢e-16 0.030 
H 16 1.3878e-16 0.030 
128 TELPACK 10 3.053 1e-16 0.470 
LR 18 1.3878e-16 0.060 
H 18 1.3878e-16 0.060 
256 TELPACK 11 4.2340e-16 0.500 
LR 19 2.2204e-16 0.050 
H 19 1.3878e-16 0.060 
512 TELPACK 12 6.6337e-16 0.530 
LR 20 2.2204e-16 0.070 
H 20 1.6653e-16 0.070 
1024 TELPACK 13 3.1832e-15 0.550 
LR 21 2.4980e-16 0.080 
H 21 1.9429e-16 0.070 
2048 TELPACK 13 3.8142e-14 0.550 
LR 22 2.2204e-16 0.080 
H 22 1.9429e-16 0.080 


Table 10.7 Experiment G4 continued 


M Method I \|Gr — A(Gz)|loo CPU Time (s) 
4096 TELPACK 14 6.3620e-14 0.530 
LR 23 1.9429e-16 0.080 
H 23 2.7756¢e-16 0.080 
8192 TELPACK 15 1.5971e-13 0.610 
LR 24 2.4980e-16 0.090 
H 24 3.0531e-16 0.090 
16384 TELPACK 16 4.2425e-12 0.650 
LR 25 2.2204e-16 0.090 
H 25 2.2204e-16 0.080 
32768 TELPACK 1%. 2.5773e-11 0.690 
LR 2 1.9429e-16 0.100 
H 27 2.2204e-16 0.100 
65536 TELPACK 25 6.5647e-08 0.960 
LR 32 1.9429e-16 0.130 


H 32 2.2204e-16 0.110 
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10.3.3.3 Overview of Experiments G3 and G4 


In the light of its design versatility, Algorithm H compares quite well with the 
above-mentioned more specialist algorithms. Its performance with respect to 
CPU time and accuracy is comparable with that of the logarithmic-reduction 
(LR) algorithm. Both the logarithmic-reduction algorithm and Algorithm 
H require considerably less CPU time than does TELPACK (the difference 
in times sometimes being as much as an order of magnitude) for superior 
accuracy. 

In Experiments 3 and 4 we employ the alternative error measure ||G; — 
A(Gr)|loo < € suggested by Meini (see, for example, [8]). In terms of this 
measure, the performance of TELPACK deteriorates steadily with an increase 
in the size of M, whereas Algorithms H and LR are unaffected. 

The last two TELPACK entries in Tables 10.5 and 10.7 are in small type- 
face to indicate that TELPACK was unable to produce a result in these cases 
and crashed, generating the error message ‘segmentation fault.’ Reducing € 
to 1078 produced a result in both instances. 


10.3.3.4 Experiment G5 


We ran Algorithm H on the Daigle and Lucantoni problem with the call 
holding rate fixed at r = 3007! s~!, the offered data traffic at 28% and the 
calling population size M at 65,536, varying the size of the matrices from 
24 x 24 to 500 x 500. In all cases we used (10.9) as a stopping criterion with 
e= 105%: 

We found that although the iteration counts decreased as the size of the 
matrices increased, CPU times increased substantially (see Table 10.8). This 
held for all matrix sizes except for 24 x 24 (the first entry in Table 10.8) where 
the computation required for the extra iterations outweighed the speed gain 
due to smaller matrix size. 


10.3.4 Experiment G6 


We now turn our attention to the case of a null recurrent process where the 
defining transition matrices for the system are given by 


0.4 0 0 0.1 0.5 0 
Ao | 0 ee os Fe ol an Be | 0 ale 


Results for this experiment are given in Table 10.9. The stopping criterion 
used was (10.9) with « = 10-8. We note that this case is not covered by 


202 E. Hunt 


Table 10.8 Experiment G5 


H 

K Iterations [ lle— Grellco CPU Time (s) 
23 29 9.4832e-09 0.110 
24 19 5.1710e-11 0.080 
25 18 8.0358e-11 0.080 
26 17 2.6813¢-08 0.090 
27 17 9.2302e-11 0.100 
28 17 4.4409e-16 0.100 
29 16 2.5738e-08 0.110 
39 15 2.3319e-11 0.200 
49 14 1.18140e-09 0.260 
59 14 2.2204e-15 0.600 
69 13 3.6872e-08 1.130 
79 13 4.5749e-10 2.250 
89 13 4.5552e-12 4.170 
99 13 5.3213¢e-13 7.670 
149 12 3.0490e-09 76.400 
299 12 9.7700e-15 853.990 
499 12 5.8509e-14 3146.600 


Table 10.9 Experiment G6 


Method Iterations [ lle -— Grelloo CPU Time (s) 


Neuts 11307 2.1360e-04 10.950 
LR 24 3.9612e-08 0.010 
H 24 3.7778e-08 0.010 


the Akar and Sohraby methodology and therefore that TELPACK cannot be 
used for this experiment. The results for the H and LR Algorithms are several 
orders more accurate than that for the Neuts Algorithm with significantly 
lower CPU times. 


10.3.5 Experiment G7 


The numerical experiments above all involve matrix functions A(z) of rational 
form. We could find no examples in the literature for which A(z) is not 
rational. The following is an original example showing how Algorithm H 
(and the Neuts Algorithm) perform when A(z) is not rational. We note that 
these are the only two algorithms which can be applied here. 

Suppose p,q are positive numbers with sum unity. We define two k x k 
matrices 29, 2, with ‘binomial’ forms. Let 
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0 0 bi 0 0 
2 = i : “ ‘ : 
0 0 — 0 0 
Dore ee eo ee 
q 0 0 0 
p° 2pq ¢ 0 0 
Y= : : re : : 
BEN (Ppa (Pope a? (p3) Ba? gh 
0 0 0 see 0 0 
We now define 
Ao = Noe", 
n n-1 
ar? \ spk 
Ay = Noe al T Qe (n _ D! (n = 1), 


for r a positive number. We remark that 


A(z) := b= Amz™ 
m=0 
= (MQ +z2AjeTE) (\z| <1), 


so that A(z) is irreducible for 0 < z < 1 and stochastic for z = 1. 
Let w := (w1,W2,...,w,) denote the invariant probability measure of 


Q — No + 12, => A(1). 


Then the condition 
wA'(lje<1 


for G to be stochastic (see [9, Theorem 2.3.1]) becomes 
Ww [Qy + r(wo + 92;)| e€ Ss 1, 


that is, 


w [(r + 1)e—(0,0,...,we)"] <1 


or 
rt+l—-u;, <1. 


We deduce that G is stochastic if and only if 


Tr Swe. 
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The parameter choice r = 1 thus provides the new and interesting situation 
of a transient chain. Results are given in Table 10.10 (with the size of the 
matrices set to 5 x 5). 

Since G is not stochastic we again revert to the use of 


|| A(Gr) — Grlloo <€ 


as an error measure. 


Table 10.10 Experiment G7: a transient process 


Pp Method I |Gr — A(G_r)Iloo CPU Time (s) 


0.05 Neuts 78 5.2153e-13 1.950 
H 6 1.1102e-16 0.006 
0.1 Neuts 38 6.5445e-13 0.960 
H 5 1.1102e-16 0.003 
0.2 Neuts 18 3.3762e-13 0.480 
H 4 8.3267e-17 0.002 
0.3 Neuts Ae 3.5207e-13 0.310 
H 3 2.5153e-17 0.002 
0.4 Neuts 8 5.8682e-14 0.210 
H 2 2.3043¢e-13 0.001 
0.5 Neuts 6 2.1154e-14 0.150 
H 2 5.5511e-17 0.001 
0.6 Neuts 4 1.5774e-13 0.130 
H 2: 1.7347e-18 0.001 
0.7 Neuts 3 1.0413e-13 0.100 
H 1 2.7311e-15 0.001 
0.8 Neuts 2 6.4682e-13 0.080 
H 1 2.7756¢e-17 0.001 
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Chapter 11 


Interpolating maps, the modulus map 
and Hadamard’s inequality 


S. 8. Dragomir, Emma Hunt and C. E. M. Pearce 


Abstract Refinements are derived for both parts of Hadamard’s inequality 
for a convex function. The main results deal with the properties of various 
mappings involved in the refinements. 


Key words: Convexity, Hadamard inequality, interpolation, modulus map 


11.1 Introduction 


A cornerstone of convex analysis and optimization is Hadamard’s inequality, 
which in its basic form states that for a convex function f on a proper finite 
interval [a, b] 


(2) < fae ax < HOEIO | 


whereas the reverse inequalities hold if f is concave. For simplicity we take f 
as convex on [a,b] throughout our discussion. The three successive terms in 
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Hadamard’s inequality are all means of f over the interval [a,b]. We denote 

them respectively by my(a,b), My(a,6), My(a,b) or simply by m, M, M 

when f, a, b are understood. The bounds m and M for M are both tight. 
More generally, the integral mean M is defined by 


1 b : 
io dz ifa#b 
M,(a,b) = 4 Bada f(2) 
(a,b) 15 if a=b 
The Hadamard inequality can then be written as 
my(a,b) < My(a,b) < My(a,b) (11.1) 


without the restriction a F b. 

There is a huge literature treating various refinements, generalizations and 
extensions of this result. For an account of these, see the monograph [4]. Work 
on interpolations frequently involves use of the auxiliary function 


b:(p,q) = pt+q(l—t) te [0,1 


for particular choices of p,g. Thus a continuous interpolation of the first part 
of Hadamard’s inequality is available via the map Hy : [0,1] — R given by 


1 b 
H(t) :-= —— 
i= paz ff) ae 
where for x € [a, b] we set 


yr(x) == bs(a, (a + b)/2). 


Theorem A. We have that: 
(a) Hy is convex; 
(b) Hy is nondecreasing with H;(0) =m, Hy(1) = M. 


The first part of Hadamard’s inequality is associated with Jensen’s in- 
equality and has proved much more amenable to analysis than the second, 
though this is the subject of a number of studies, see for example Dragomir, 
Milogevié and Sandor [3] and Dragomir and Pearce [5]. In the former study 
amap Gy : [0,1] — R was introduced, defined by 


1 
Gp(t) = 5 Leen) + Fw)], 
where u;(t) := y:(a) and ue(t) := y,(b) for t € [0,1]. 
Theorem B. The map Gy enjoys the following properties: 
(i) Gf is convex on [0,1]; 


(ii) Gz is nondecreasing on [0,1] with Gs (0) =m, Gy(1) = M; 
(tit) we have the inequalities 
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0< H;(t) -m<G;(t) —Hy(t) Vte [0,1] (11.2) 
and 
ve (AP) < ; 7 (=) if (“=*)| 
1 
< G'p(t) dt 
0 
< mie (11.3) 


Inequality (11.2) was proved for differentiable convex functions. As this 
class of functions is dense in the class of all convex functions defined on the 
same interval with respect to the topology induced by uniform convergence, 
(11.2) holds also for an arbitrary convex map. 

Dragomir, MiloSevi¢ and Sandor introduced a further map Ly : [0,1] - R 
given by 


b 
LA) = se | Uw) + £00) ae, (11.4) 


where we define 


u(x) :=d:(a,z) and v(x) := ¢:(b, 2) 


for x € [a,b] and t € [0,1]. 
The following was shown. 


Theorem C. We have that 
(1) Ly is convex on [0,1]; 
(2) for allt € [0,1] 


G;(t)< L(t) < (1 -t)M+tM <M, 


(3) for allt € [0,1] 


Hy(1—t)< L(t) and L(t). (11.5) 


In this chapter we take these ideas further and introduce results involving 
the modulus map. With the notation of (11.1) in mind, it is convenient to 
employ 


a(x) := lal, i(a):=a. 
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This gives in particular the identity 


Ha) if f(a) = f(b) 
F)—Fla) Ha ars ) dx otherwise . 


M;(a,6) = Mi(f(a), F(0)) = 


In Section 2 we derive a refinement of the basic Hadamard inequality 
and in Section 3 introduce further interpolations for the outer inequality 
G'p(t) — Hp(t) > 0 in (11.2) and for Hy(t) —m > 0, all involving the modulus 
map. 

For notational convenience we shall employ also in the sequel 


wy(t) = d:(a,b), we(t) := ¢;(b, a). 


We note that this gives 


(EB) =f), 6 (A) = 10, 


ey =5(s(4S™) +0 (4)]. (11.6) 


In Section 4 we derive some new results involving the identric mean I(a, b) 
and in Section 5 introduce the univariate map My, : [0,1] — R given by 


so that 


Mg(t) = 5 [f(wi) + f(wa)] (11.7) 


and derive further results involving Ly. We remark that by convexity 


f(wi) Stfla) +1 —t)f() and f(we) < (1—t) f(a) + tf(), 


so that 
My(t) < M. (11.8) 


11.2 A refinement of the basic inequality 


We shall make repeated use of the following easy lemma. 


Lemma 2.1 Jf f, g are integrable on some domain I and 


then 


[ tare | [ o(eree 
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Proof. We have [ tae > | Io(e)lae > [acer : 


We now proceed to a refinement of Hadamard’s inequality for convex func- 
tions. For this result, we introduce the symmetrization f* of f on [a,b], defined 
by 


f(a) = SIf(@) + flat b-2)). 
This has the properties 


mys(a,b) = my(a,b), Meys(a,b) = Myz(a,b), Mys(a,b) = My(a, 6). 


Theorem 2.2 Let I Cc R be an interval and f : I — R. Suppose a,b € I 
with a <b. Then 


My(a,b) — My(a,b) = |Mo(f(a), f(0)) — Moos (a, ®)| (11.9) 


and 
M j(a,b) — mz(a,b) > |Moozs (a,b) — |mp(a, 6)| | - (11.10) 


Proof. From the convexity of f, we have for t € [0,1] that 
0<tf(a)+(1—#)f(b) — f(tat+ (1 —2)d). 
By virtue of the general inequality 
|c—d| >| |e|— Id] |, (11.11) 
we thus have 


tf(a) + (1 — t) f(b) — f(ta + (1 — #)b) 
2 | lef) + —Of(b)| — [fat —t)b)]]. (11.12) 


Lemma 2.1 provides 


sa) f uat+ 00 f a—oar— [ f(ta + (1 — t)b)dt 


a 


[ t#@+a-Hs@lae- [ ta + (19) a (11.13) 
0 0 


Inequality (11.9) now follows from evaluation of the integrals and a change 
of variables. 
Similarly we have for any a, 3 € [a, b] that 


re. (248) 5 | |@las®) s (24°) ||. 


2 2 
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Set a = wi, B = we. Then 
a+B=a+b (11.14) 


and by Lemma 2.1 we have 


af ste ae flog) at] -m 
>| f f(wi) + f(we) 
a : 


By (11.14) and a change of variables, the previous result reduces to (21.2). 


dt | ; 


When does the theorem provide an improvement on Hadamard’s inequal- 
ity? That is, when does strict inequality obtain in (11.9) or (21.2)? To 
answer this, we consider the derivation of (11.11). Since c = (c — d) +d, 
we have 


\c| <|e—d|+|d| (11.15) 
or 

\c| — |d| < |e—d. (11.16) 
By symmetry we have also 

|d| — |c| < |d— cl]. (11.17) 


Combining the last two inequalities yields (11.11). 
To have strict inequality in (11.11) we need strict inequality in both (11.16) 
and (11.17), or equivalently in both (11.15) and 


|| < |d—e] + |e. 


Thus strict inequality occurs in (11.16) if and only d, c— d are of opposite 
sign, and in (11.17) if and only c, d—c are of opposite sign. These conditions 
are satisfied if and only if c and d are of opposite sign. 

It follows that strict inequality obtains in (11.12) if and only if 


tf(a)+(1—t)f(b) and f(ta+ (1 —1t)b) are of opposite sign, 
that is, if and only if 
tf(a) + (1—t) f(b) >0> f(ta+ (1—#)bd). (11.18) 


Since an integrable convex function is continuous, (11.13) and so (11.9) ap- 
plies with strict inequality if and only if there exists t € [0, 1] for which (11.18) 
holds. 

Similarly strict inequality applies in (21.2) if and only if there exist a, 3 € 
(a, 6] such that 
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f(a) + £8) ) 
2 ao a 


>o>s( 


Changes of variable yield the following condition. 


Corollary 2.3 A necessary and sufficient condition for strict inequality in 
(11.9) is that there exists x € [a,b] such that 


(b— x) f(a) + (w— a) f(b) > 0> f(z). 


A necessary and sufficient condition for strict inequality in (21.2) is that 
there exists x € [a,b] such that 


f*(z) >0>m. 
Corollary 2.4 If in the context of Theorem 2.1 f° = f, then 
fla) -—M 2 | |F@— Mooz(a, 4) (11.19) 


and 
M—m> | Moor(a,b) — |m| |. (11.20) 


A necessary and sufficient condition for strict inequality in (21.4) is that 
there exists x € [a,b] such that 


f(x) <0< f(a). 


A necessary and sufficient condition for strict inequality in (21.5) is that 
there exists x € [a,b] such that 


f(z) >0>m. 
These ideas have natural application to means. For an example, denote by 


A(a, b), G(a, b) and I(a, b) respectively the arithmetic, geometric and identric 
means of two positive numbers a, b, given by 


b seas 
A(a,b) = “. , G(a,b) = Vab 
and ee 
1 (ee Tag ae 
ec em: 
a if a=b 


These satisfy the geometric—identric—arithmetic (GIA) inequality 
G(a,b) < (a,b) < A(a,b). 


This follows from 
G(a,b) < L(a,b) < A(a,b) 
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(where L(a,b) refers to the logarithmic mean), which was first proved by 
Ostle and Terwilliger [6] and Carlson [1], [2], and 


L(a,b) < I(a,b) < A(a,b), 


which was established by Stolarsky [7], [8]. 
The first part of the GIA inequality can be improved as follows. 


Corollary 2.5 Ifa € (0,1] and b € [1, co) with a 4 b, then 


I(a, b) 
G(a, bd) 


> exp | | ae In eter ee | | | 
21. (11.21) 


Proof. For the convex function f(x) = —Ina (a > 0), the left-hand side of 
(11.9) is 


| b 
na+Inb | 1 / Te 
b-aJa 


= “[binb alna — (b—a)| —InG(a, b) 
_ I(a, b) 
=1n] Ae). 


i ee: (In b)? + (In a)? 
=——— eee 
1 


na 


Since 


and 


b 
i) [In x|dx = In [a*b’e?-9*-*] , 


we have likewise for the same choice of f that the right-hand side of 
(11.9) is 


>) 


| (In b)? + (Ina)? 
In((b/a)?) 


whence the desired result. 


In [Caco taal 


We note for reference the incidental result 


M_in(a,b) = —In I(a, b) (11.22) 


derived in the proof. 
For the first inequality in (21.6) to be strict, by Corollary 2.3 there needs 
to exist x with 1 < x < b for which 
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(b—2)Ina+(x-—a)Inb <0. 


Since the left-hand side is strictly increasing in x, this condition can be sat- 
isfied if and only if the left-hand side is strictly negative for « = 1, that is, 
we require 


(b—1)Ina+(1—a)lnb <0. (11.23) 
Because b > 1, we have b— 1 > Inb and so (b— 1)/a—Inb > 0, since 
0<a<1. Thus the left-hand side of (11.23) is strictly increasing in a. It 
tends to —oo as a — 0 and is zero for a = 1. Accordingly (11.23) holds 
whenever 0<a<1<b. 
The second part of the GIA inequality may also be improved. 


Corollary 2.6. I[f[0<a<b<o, then 


A(a, b) L.- yf a+b 
9 > bey f 
T(a,) =o | tl lin /a(a b 2)| dv in( 5 )I]] 
S17, (11.24) 
Proof. For the convex function f(a) = —In«a (a > 0) we have that 
a+b = A(a, b) 
M-m=1n( ) = in(a,0) = 0 | 72] 


and that the right-hand side of (21.2) is 


1 b 
il 


The stated result follows from (21.2). 


In Vala+b—2)| dx 


(=) 
In F 
2 


By Corollary 2.3, a necessary and sufficient condition for the first inequality 
in (21.7) to be strict is that there should exist x € [a,b] such that 


b 
Infe(a+ b= 2)] <0 <n ; 


that is, 
t(a+b—2) <1<(a+b)/2. 


The leftmost term is minimized for x = a and x = b, so the condition reduces 
to 
ab<1<(a+b)/2 or 2-—b<a<1/b. 


Since 2—b < 1/b for b 4 1, there are always values of a for which this 
condition is satisfied. 
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Similar analyses may be made for the refinements of inequalities derived 


in the remainder of this chapter. 


11.3 Inequalities for Gy and Hy 


Our first result in this section provides minorants for the difference between 
the two sides of the first inequality and the difference between the outermost 
quantities in (11.2). 


Theorem 3.1. Suppose I is an interval of real numbers with a,b © I and 
a <b. Then if f : I+ R is convex, we have for t € [0,1] that 


Gs(t) — Hy(t) = |Mo(f(ur), f(u2)) — Hoos (t)| (11.25) 


and 
Hy(t)-—m > |Moors(a, b) — |m| |. (11.26) 


Proof. We have 


1 
Gip(t) = 5[f(ur) + F(ua)] = My(ur, va), 
H(t) — M foy, (a, b) = M (ur, v2), 
mp(U1, U2) = my¢(a,b) = m, 
so for t € [0,1] application of Theorem 2.2 to f on (ui, u2) provides 
G(t) — Hg(t) 2 |Mo(f (ur), f(u2)) — Moor(ui, Ua)| 5 (11.27) 
y(t) —m > |Moofs (ui, U2) — |m| |. (11.28) 
Since ug — uy = t(b— a), we have for ¢ € (0, 1] that 


Mess Ghiaey / “Ley 


b—a Ju, t 
b 
= | Mflua)lax 
= ao f(t), 


so (11.27) yields (11.25) for t € (0, 1]. As (11.25) also holds for t = 0, we have 
the first part of the theorem. Using u; + ug = a+b, we derive the second 
part similarly from Moo ps (ui, U2) = Moo feoy, (a, b). 


Our next result provides a minorant for the difference between the two 
sides of the second inequality in (11.3) and a corresponding result for the 
third inequality. 
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Theorem 3.2. Suppose the conditions of Theorem 8.1 hold. Then 


= M3 [Mo(m,M) - ]Gs(0) at (11.29) 


: a+b 3b 
>5\/ IG-(t) + G;(1— Ol at W( J+A(F )] |. 


Proof. First, we observe that 


if taae [ Flu| 


9 (a+b) /2 D) b 
i; —a ip es b-a = re] 


= M. (11.31) 


[ G;(t)dt = 


Nl MIR 


Application of Theorem 2.2 to Gy on [0,1] provides 


G (0) + Gr 
2 


_f "Gj (tat > [Ma(Gs(0),Gs(1)) — Maca (0.1) 
on M — G;(1/2) > Mrocz(0.1) = le; (5) | 


By (11.31) and the relation Gs(0) = m, Gy(1) = M, we have the stated 
results. 


11.4 More on the identric mean 


For a,b > 0, define Yq,» : [0,1] ~ R by 
Ya,o(t) = G(u1, ua), (11.32) 


where, as before, G(x, y) denotes the geometric mean of the positive numbers 
L,Y. 


Theorem 4.1 The mapping Ya,» possesses the following properties: 
(a) Ya,b 18 concave on [0,1]; 
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(b) Ya,b is monotone nonincreasing on [0,1], with 
Yap(1) =G(a,b) and  Ya»(0) = A(a, b); 


(c) fort € [0,1] 
Vaelt) < I(u1, v2) < A(a+ b); 


(d) we have 
(2e 2+) >o(SP 24) 
4 4 4 4 
> I(a,b) 
2 G(A(a, b), G(a, b)) 
> G(a,b); 


(e) for t € [0,1] we have 


ie A(a, b) < Lu, U2) 


= Tusa) aaa) (11.33) 


Proof. We have readily that for t € [0,1] 


Ya,o(t) = exp [—G_in(€)], 
H_,,(t) = —InI(ur, ua). 
Since the map x :— exp(—2) is order reversing, (b)—(e) follow from Theorem 
B(ii), (iii). It remains only to establish (a). 
Since du;/dt = (—1)'(b — a) /2 for i= 1,2 and uz — vu, = t(b— a), we have 
from (11.32) that 


dyap  b-G@ MW—U2 _ t(b— a)? 
dt a 4 (uyug)!/2 ~ A(uyug)!/2 
and so 
dab = (b— a)? (b— a)? 1/2, —3/2 1/2. —3/2 
dt ——— 8(uyug)#/2 16 [es Ma ee ae S| 


This establishes (a). 


We now apply Theorems 3.1 and 3.2 to obtain further information about 
the identric mean. For t € [0,1], put 


Na,p(t) = I(uy, ug). 
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Because 
G_in(t) =—Inyap(t), H-m(t) = —Innap (6), 


(11.25) provides 
In Na,b(t) —In Ya,b(t) = G_ in(t) — H_ in(t) > La p(t), 


where 
Lap(t) = Mo(In wi, In uz). 


This yields 
Na,b(t) 
Ya,b(t) 
From (11.26) we derive 


In () —Innap(t) = 


which gives 


>exp[Lap(t)}>1 for te [0,1]. 
n(>)| 
m()|]| 


Also application of (11.29) to the convex function — In yields 


it b 
al [In /yug| dx — 


1 b 
—/ (In /yug| da — 


>1 for te (0,1). 


In I(a, b) — In[G(A(a, b), G(a, b))] > Ka, 


where 


1 
Kap = Moin A(a, b), In G(a, b)) — | (ln Ya,0(t)| au . 
0 


Hence 


I(a, b) 
G(A(a, b), G(a, b)) 2 exp [Kao] el 


Finally, applying (11.30) to the convex mapping — In provides 


In ic (*. at) In F(a, ) 


ae ie 3a+b\ (at3b 
> S| f ttraatt)ras(t - alle fn |(227*) (245) | | 
= a,b> 
where 
; b b 
Maw =| f° InG(rae(ttaa(t - 0)|de- [infor (F2E*, S42) 
0 
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Hence 
G (2322, St) 


Ne ee M,.»| > 1. 
T(a,b) 2 exp [Ma,b] 2 


11.5 The mapping LF 


We now consider further the properties of the univariate mapping Ly, defined 
in the introduction. First we introduce the useful auxiliaries 


b 
Apt) = pf flu) de, 


b 
By(t):= — | f(v) dx 


for t € [0, 1]. We have immediately from (11.4) that 


L(t) = 5 (Ar) + By). 


The following property is closely connected with the second part of the 
Hadamard inequality. 


Proposition 5.1. Suppose a < b and f : [a,b] > R is convex. Then 


De le ee) 
g MEM) 
<M (11.34) 


for allt € [0, 1]. 


Proof. For t € [0,1] we have 


Wi b 
A;(t) = : f(u)du and B,(t) : ff teoae. 


wi—-aJa b— we Jig 


Substituting A;(t), By(t) for the leftmost terms in the known inequalities 


—_ i Hu)du < : aS ey +1 (4")| < £) sei 
— [ One : ae —— | (3%) < f (0) —— 
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respectively and adding, gives by (11.4) that 


L(t) z [M+ 5 (eon) + Fu) (4) (3%) 


IA 


1 


<5 [r+ 5 (ren) + sw). 


The first two inequalities in (11.34) follow from (11.6) and (11.7) and the 
final inequality from (11.8). 


The first inequality in (11.5) is improved by the following proposition. We 
introduce the auxiliary variable 


2=o(t,2) = (1- et 1 for a €l[a,b] and te [0,1]. 


Proposition 5.2. Under the assumptions of Proposition 5.1 we have 


L;(t)— Hy(1-2) 
1 “1! fu) + f(ut t(b—a)) 

> apace | 2 : 

>0 (11.35) 


U Aag( le =F) 


for all t € [0, 1). 
Proof. Put 
z=(utv)/2=¢((a+b)/2,x) for x € [a,b] and t € 0,1]. 


By (11.11) and the convexity of f, 


Fo) + £0) _ 9) - Eo) +40) _ 44, 
: | aes eee | 
>0 


for all x € [a,b] and ¢ € (0, 1]. 
By Lemma 2.1, integration with respect to «x over [a,b] provides 


1 b 
ral 


Ls(t) =H; —t) > toe de — Hoos(1—t)| > 0. 


222 8.8. Dragomir et al. 


fu) + flu+ tb = a)) 
2 


1 ed 
=“ C=G=) i. on 


inequality (11.35) is proved. 


Remark 5.3. We can apply Theorem 2.2 to Ly to provide results similar to 
Theorems 3.1 and 3.2. In fact we may readily verify that the components A f 
and By are themselves convex and so subject to Theorem 2.2. 


We now apply the above to obtain results for the identric mean. We may 
compute Ay, By for f = —In to derive 


1 


W1—a 


A_in(t) = [oc Inu)du = —InI(wy, a), 


1 b 
B_,,(t) = / (—Inu)du = —InI(b, we). 
Thus 


L_ in(t) = : [A_ in(t) + BL in(t)] = —In Ca,p(t), 


where the map ¢,. : [0,1] — R is defined by 
Ca,o(t) = GU (a, w1), L(we, b)). 


Theorem 5.4. We have the following. 
(a) for all t € [0,1] 


Ya,(t) = Ca,o(t) = [I(a,b)]'~"[G(a, b)]’ = Gla, b); 
(b) for allt € [0,1] 


Na,b(1 —t)> Ca,b(t) and G(na,o(t), Na,o(1 =0)) 2 Ca,o(t). 


Proof. Since 
Caa(t) =exp[-L-i,()] for all t € [0,1] 


and the map x :—> eap(—2) is order reversing, (a) and (b) follow from Theo- 
rem C, parts 2 and 3. 


Remark 5.5. Similar results may be obtained from Propositions 5.1 and 
5.2. 


11 Interpolating maps, the modulus map and Hadamard’s inequality 223 
References 

1. B.C. Carlson, Some inequalities for hypergeometric functions, Proc. Amer. Math. Soc. 
17 (1966), 32-39. 

2. B. C. Carlson, The logarithmic mean, Amer. Math. Monthly 79 (1972), 615-618. 

3. S. S. Dragomir, D. S. MiloSevié and J. SAndor, On some refinements of Hadamard’s 
inequalities and applications, Univ. Belgrad Publ. Elek. Fak. Sci. Math. 4 (1993), 
21-24. 

4.8. SS. Dragomir and C. E. M. Pearce, Hermite-—Hadamard  Inequali- 
ties, RGMIA Monographs, Victoria University, Melbourne (2000), online: 
http://rgmia.vu.edu.au/monographs. 

5. S. S. Dragomir and E. Pearce, A refinement of the second part of Hadamard’s in- 
equality, with applications, in Sicth Symposium on Mathematics & its Applications, 
Technical University of Timisoara (1996), 1-9. 

6. B. Ostle and H. L. Terwilliger, A comparison of two means, Proc. Montana Acad. Sci. 
17 (1957), 69-70. 

7. K. B. Stolarsky, Generalizations of the logarithmic mean, Math. Mag. 48 (1975), 
87-92. 

8. K. B. Stolarsky, The power and generalized of logarithmic means, Amer. Math. 


Monthly 87 (1980), 545-548. 


Chapter 12 


Estimating the size of correcting codes 
using extremal graph problems 


Sergiy Butenko, Panos Pardalos, Ivan Sergienko, Vladimir Shylo 
and Petro Stetsyuk 


Abstract Some of the fundamental problems in coding theory can be formu- 
lated as extremal graph problems. Finding estimates of the size of correcting 
codes is important from both theoretical and practical perspectives. We solve 
the problem of finding the largest correcting codes using previously developed 
algorithms for optimization problems in graphs. We report new exact solu- 
tions and estimates. 


Key words: Maximum independent set, graph coloring, error-correcting 
codes, coding theory, combinatorial optimization 


12.1 Introduction 


Let a positive integer / be given. For a binary vector u € B! denote by F.(u) 
the set of all vectors (not necessarily of dimension !) which can be obtained 
from u as a consequence of a certain error e, such as deletion or transposition 
of bits. A subset C C B! is said to be an e-correcting code if F.(u) 1) Fe(v) = 0 
for all u,v € C, u# v. In this chapter we consider the following cases for the 
error e. 


e Single deletion (e = 1d): Fig(u) C B'~? and all elements of Fia(u) are 
obtained by deletion of one of the components of u. For example, if | = 4 
and u = 0101 then Fya(w) = {101,001, 011,010}. See [25] for a survey of 
single-deletion-correcting codes. 
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e Two-deletion (e = 2d): Foa(u) C B'~? and all elements of Foa(u) are 
obtained by deletion of two of the components of u. For u = 0101 we have 
Foq(u) = {00, 01, 10, 11}. 

e Single transposition, excluding the end-around transposition (e = 1t): 
Fi,(u) C B! and all elements of F\,(u) are obtained by transposition of a 
neighboring pair of components in u. For example, if / = 5 and u = 11100 
then Fy,(w) = {11100, 11010}. 

e Single transposition, including the end-around transposition (e = let): 
Fiez(u) C B! and all elements of Fy-;(u) are obtained by transposition 
of a neighboring pair of components in u, where the first and the last 
components are also considered as neighbors. For | = 5 and u = 11100 we 
obtain Fyer(w) = {11100, 11010, 01101}. 

e One error on the Z-channel (e = 12): Fiz(u) C B! and all elements 
of F,,(u) are obtained by possibly changing one of the nonzero compo- 
nents of u from 1 to 0. If? = 5 and wu = 11100 then F\,(u) = 
{11100, 01100, 10100, 11000}. The codes correcting one error on the Z- 
channel represent the simplest case of asymmetric codes. 


Our problem of interest here is to find the largest correcting codes. It appears 
that this problem can be formulated in terms of extremal graph problems as 
follows [24]. 

Consider a simple undirected graph G = (V,F), where V = {1,...,n} 
is the set of vertices and F is the set of edges. The complement graph of G 
is the graph G = (V,E), where EF is the complement of E. Given a subset 
W CV, we denote by G(W) the subgraph induced by W on G. A subset 
I CV is called an independent set (stable set, vertex packing) if the edge set 
of the subgraph induced by J is empty. An independent set is maximal if it 
is not a subset of any larger independent set and maximum if there are no 
larger independent sets in the graph. The independence number a(G) (also 
called the stability number) is the cardinality of a maximum independent set 
in G. A subset CC V is called a clique if G(C) is a complete graph. 

Consider a graph G; having a vertex for every vector u € B!, with an 
edge joining the vertices corresponding to u,v € B!, u ¥ v if and only if 
F.(u)()Fe(v) 4 0. Then a correcting code corresponds to an independent 
set in G,. Hence the largest e-correcting code can be found by solving the 
maximum independent set problem in the considered graph. Note that this 
problem could be equivalently formulated as the maximum clique problem in 
the complement graph of G. 

Another discrete optimization problem which we will use to obtain lower 
bounds for asymmetric codes is the graph coloring problem, which is formu- 
lated as follows. A legal (proper) coloring of G is an assignment of colors to 
its vertices so that no pair of adjacent vertices has the same color. A color- 
ing induces naturally a partition of the vertex set such that the elements of 
each set in the partition are pairwise nonadjacent; these sets are precisely the 
subsets of vertices being assigned the same color. If there exists a coloring 
of G that uses no more than k colors, we say that G admits a k-coloring 
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(G is k-colorable). The minimal k for which G admits a k-coloring is called 
the chromatic number and is denoted by y(G). The graph coloring problem 
is to find x(G) as well as the partition of vertices induced by a y(G)-coloring. 

The maximum independent set (clique) and the graph coloring problems 
are NP-hard [15]; moreover, they are associated with a series of recent results 
about hardness of approximations. Arora and Safra [2] proved that for some 
positive € the approximation of the maximum clique within a factor of n* is 
NP-hard. Hastad [16] has shown that in fact for any 6 > 0 the maximum 
clique is hard to approximate in polynomial time within a factor n!~°. Sim- 
ilar approximation complexity results hold for the graph coloring problem 
as well. Garey and Johnson [14] have shown that obtaining colorings using 
sx(G) colors, where s < 2, is NP-hard. It has been shown by Lund and Yan- 
nakakis [18] that .(G) is hard to approximate within n* for some € > 0, and 
Feige and Kilian [13] have shown that for any 6 > 0 the chromatic number is 
hard to approximate within a factor of n!~°, unless NP C ZPP. These results 
together with practical evidence [17] suggest that the maximum independent 
set (clique) and coloring problems are hard to solve even in graphs of mod- 
erate sizes. Therefore heuristics are used to solve practical instances of these 
problems. References [3] and [19] provide extensive reviews of the maximum 
clique and graph coloring problems, respectively. 

In this chapter, using efficient approaches for the maximum independent 
set and graph coloring problems, we have improved some of the previously 
known lower bounds for asymmetric codes and found the exact solutions for 
some of the instances. 

The remainder of this chapter is organized as follows. In Section 12.2 we 
find lower bounds and exact solutions for the largest codes using efficient 
algorithms for the maximum independent set problem. In Section 12.3 a 
graph coloring heuristic and the partitioning method are utilized in order to 
obtain better lower bounds for some asymmetric codes. Finally, concluding 
remarks are made in Section 12.4. 


12.2 Finding lower bounds and exact solutions 
for the largest code sizes using a maximum 
independent set problem 


In this section we summarize the results obtained in [5, 21]. We start with 
the following global optimization formulation for the maximum independent 
set problem. 


Theorem 1 ({1]). The independence number of G satisfies the following 


equality: 
x y Xj | | 1—42;). 12.1 
ijn : ( a) ( ) 


230 S. Butenko et al. 


This formulation is valid if instead of [0,1]” we use {0,1}” as the feasi- 
ble region, thus obtaining an integer 0-1 programming problem. In problem 
(12.1), for each vertex i there is a corresponding Boolean expression: 


Therefore the problem of finding a maximum independent set can be re- 
duced to the problem of finding a Boolean vector «* which maximizes the 
number of “true” values among 7r;,i =1,...,7: 


x” = argmax 2 a/\ \ ora ae (12.2) 


t= (i,j)EE 


To apply local search techniques to the above problem one needs to define 
a proper neighborhood. We define the neighborhood on the set of all maximal 
independent sets as follows. 

For each jg € I, g=1,...,|J, 


has exactly 2 literals 
List;, = Vi ¢ Tir. = | xi \ XZ, | with value 0,namely 
(i,k)EE x; =O0and %;, =0 


If the set List;, is not empty, let [(G(List;,)) be an arbitrary maximal 
independent set in G(List,,). Then sets of the form 


(I — {jq}) UM(G(List,,)),q=1,..., A, 


are maximal independent sets in G. Therefore the neighborhood of a maximal 
independent set J in G can be defined as follows: 
O(D) = (I — {iq}) UM(G(List;,)), 


ja €T,g=1,-..[I}- 


We have the following algorithm to find maximal independent sets: 


1. Given a randomly generated Boolean vector x, find an appropriate initial 
maximal independent set I. 

2. Find a maximal independent set from the neighborhood (defined for max- 
imal independent sets) of I, which has the largest cardinality. 
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We tested the proposed algorithm with the following graphs arising from 
coding theory. These graphs are constructed as discussed in Section 12.1 and 
can be downloaded from [24]: 


Graphs From Single-Deletion-Correcting Codes (1dc); 
Graphs From Two-Deletion-Correcting Codes (2dc); 
Graphs From Codes For Correcting a Single Transposition, Excluding the 
End-Around Transposition (1tc); 

e Graphs From Codes For Correcting a Single Transposition, Including the 
End-Around Transposition (let); 

e Graphs From Codes For Correcting One Error on the Z-Channel (1zc). 


The results of the experiments are summarized in Table 12.1. In this table, 
the columns “Graph,” “n” and “|E|” represent the name of the graph, the 
number of its vertices and its number of edges. This information is available 
from [24]. The column “Solution found” contains the size of the largest inde- 
pendent sets found by the algorithm over 10 runs. As one can see the results 
are very encouraging. In fact, for all of the considered instances they were at 
least as good as the best previously known estimates. 


Table 12.1 Lower bounds obtained 


Graph n |E| Solution 

found 
1dc128 128 1471 16 
1dc256 256 3839 30 
1dc512 512 9727 52 
1dc1024 1024 24063 94 
1dc2048 2048 58367 172 
2dc128 128 5173 5 
2dc256 256 17183 ‘ts 
2dc512 512 54895 11 
2dc1024 1024 169162 16 
1tc64 64 192 20 
1tc128 128 512 38 
1tc256 256 1312 63 
1tc512 512 3264 110 
1tc1024 1024 7936 196 
1tc2048 2048 18944 352 
let64 64 264 18 
let128 128 672 28 
let256 256 1664 50 
let512 512 4032 100 
let1024 1024 9600 171 
let2048 2048 220528 316 
1zc128 128 1120 18 
1z¢256 256 2816 36 
1zc512 512 6912 62 
1zc1024 1024 16140 112 
1zc2048 2048 39424 198 


1zc4096 4096 92160 379 
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12.2.1 Finding the largest correcting codes 


The proposed exact algorithm consists of the following steps: 


Preprocessing: finding and removing the set of isolated cliques; 
Finding a partition which divides the graph into disjoint cliques; 
Finding an approximate solution; 

Finding an upper bound; 

A Branch-and-Bound algorithm. 


Below we give more detail on each of these steps. 


0. Preprocessing: finding and removing the set of isolated cliques 
We will call a clique C isolated if it contains a vertex 7 with the property 
|N(2)| = |C|—1. Using the fact that if C is an isolated clique, then a(G) = 
a(G—G(C)) +1, we iteratively find and remove all isolated cliques in the 
graph. After that, we consider each connected component of the obtained 
graph separately. 

1. Finding a partition which divides the graph into disjoint cliques 
We partition the set of vertices V of G as follows: 


where Cj, i= 1,2,...,k, are cliques such that C; NC; =0, 1 4 j. 
The cliques are found using a simple greedy algorithm. Starting with C) = 
0, we pick the vertex 7 ¢ C, that has the maximal number of neighbors 
among those vertices outside of C; which are in the neighborhood of every 
vertex from C). Set Cy = Ci U{j}, and repeat recursively, until there is 
no vertex to add. Then remove C; from the graph, and repeat the above 
procedure to obtain C2. Continue in this way until the vertex set in the 
graph is empty. 

2. Finding an approximate solution 
An approximate solution is found using the approach described above. 

3. Finding an upper bound 
To obtain an upper bound for a(G) we can solve the following linear 
program: 


Oc(G) = max 5° 2, (12.3) 
i=1 
Ste Cee). GH Teeny (12.4) 
4€Cj 


c>0, (12.5) 
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where Cj; € C is a maximal clique and C is a set of maximal cliques 
with |C| = m. For a general graph the last constraint should read 
O<a; <1,7=1,...,n. But since an isolated vertex is an isolated clique 
as well, after the preprocessing step our graph does not contain isolated 
vertices and the above inequalities are implied by the set of clique con- 
straints (12.4) along with nonnegativity constraints (12.5). We call O¢(G) 
the linear clique estimate. 


In order to find a tight bound O¢(G) one normally needs to consider a large 
number of clique constraints. Therefore one deals with linear programs in 
which the number of constraints may be much larger than the number 
of variables. In this case it makes sense to consider the linear program 
which is dual to problem (12.3)—(12.5). The dual problem can be written 
as follows: 


Oc(G) = min 5° y;, (12.6) 
j=l 
Sete De Gg Age Spc ats (12.7) 
j=l 
y= 0, (12.8) 


where 

1, ifie C;, 
aig = : 
0, otherwise. 


The number of constraints in the last LP is always equal to the number 
of vertices in G. This gives us some advantages in comparison to problem 
(12.3)-(12.5). If m > n, the dual problem is more suitable for solving with 
the simplex method and interior point methods. Increasing the number of 
clique constraints in problem (12.3)—(12.5) only leads to an increase in the 
number of variables in problem (12.6)—(12.8). This provides a convenient 
“restart” scheme (start from an optimal solution to the previous problem) 
when additional clique constraints are generated. 


To solve problem (12.6)—(12.8) we used a variation of an interior point 
method proposed by Dikin [8, 9]. We will call this version of interior 
point method Dikin’s Interior Point Method, or DIPM. We present a com- 
putational scheme of DIPM for an LP problem in the following form: 


m+n 


Yi 12.9 
ae, ee 
s. t. Ay =e, (12.10) 


yi > 0, i=1,...,m+n. (12.11) 
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Here A is an (m+n) Xn matrix in which the first m columns are determined 
by coefficients a;; and columns an4; = —e; for 7 = 1,...,n, where e; is 
the i-th orth. The vector c € R™*t” has its first m components equal to 
one and the other n components equal to zero; e € R” is the vector of all 
ones. Problem (12.6)—(12.8) can be reduced to this form if the inequality 
constraints in (12.7) are replaced by equality constraints. As the initial 
point for the DIPM method we choose y° such that 


2, fori =1,...,m, 
m 


0 _ 
Yi = 2 >> aj —1,fori=m+1,...,mt+n. 


Now let y* be a feasible point for problem (12.9)—(12.11). In the DIPM 
method the next point y*+! is obtained by the following scheme: 


e Determine D;, of dimension (m+n) x (m+n) as Dy = diag{y*}. 
e Compute vector 


cP = (I — (AD,)* (AD7.A")~'AD,) Dye. 
e Find pz = eee oe 


e Compute ye tt = yt (1-aS), i=1,...,m+n, where a= 0.9. 


As the stopping criterion we used the condition 


mtn m+n 
SS cy} — yy Cj Fits 1 <e, where e = 107°. 
j=1 


The most labor-consuming operation of this method is the computation 
of the vector c?. This part was implemented using subroutines DPPFA 
and DPPSL of LINPACK [10] for solving the following system of linear 
equations: 

AD? A™ uz = ADic, up € R”. 


In this implementation the time complexity of one iteration of DIPM can 
be estimated as O(n). 


The values of the vector uz; = (4D?2A7)~1ADZc found from the last sys- 
tem define the dual variables in problem (12.6)—(12.8) (Lagrange multipli- 
ers for constraints (12.7)). The optimal values of the dual variables were 
then used as weight coefficients for finding additional clique constraints, 
which help to reduce the linear clique estimate Oc(G). The problem of 
finding weighted cliques was solved using an approximation algorithm; a 
maximum of 1000 cliques were added to the constraints. 
4. A Branch-and-Bound algorithm 


(a) Branching: Based on the fact that the number of vertices from a clique 
that can be included in an independent set is always equal to 0 or 1. 
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(b) Bounding: We use the approximate solution found as a lower bound and 
the linear clique estimate Oc(G) as an upper bound. 


Tables 12.2 and 12.3 contain a summary of the numerical experiments with 
the exact algorithm. In Table 12.2 Column “#” contains a number assigned to 


Table 12.2 Exact algorithm: Computational results 


Graph # 1 2 3 
1tcl28 1 5 4 4.0002 
2 #5 5 5.0002 
3.5 5 5.0002 
4 5 4 4.0001 
1tc256 1 6 5 5.0002 
2 10 9 9.2501 
3 «19 13 13.7501 
4 0 9 9.5003 
5 «6 5 5.0003 
1tc512 1 0 7 7.0003 
2 18 14 14. 9221 
3 29 22 23.6836 
4 29 22 23.6811 
5 618 14 14.9232 
6 10 7 7.0002 
1dc512 1 (75 50 51.3167 
2dc512 1 16 9 10.9674 
let128 1 3 2 3.0004 
2 6 4 5.0003 
3.9 7 7.0002 
4 9 7 7.0002 
5 6 4 5.0003 
6 3 2 3.0004 
1et256 1 3 2 3.0002 
2 8 6 6.0006 
3 14 10. =12.0001 
4 22 12 14.4002 
5 14 10 =12.0004 
6 8 6 6.0005 
Pr oo 2 3.0002 


let512 3 3 3.0000 
10 le 8.2502 
27 18 18.0006 
29 21 23.0626 
23.1029 
27 18 18.0009 
10 7 8.2501 
3 3 3.0000 


ONaw#»krwner 
NO 
Ne} 
bo 
~— 
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Table 12.3 Exact solutions found 


Graph n |E| a(G) Time (s) 
1dc512 512. 9727 52 2118 
2dc512 512 54895 11 2618 
1tc128 128 512 38 7 
1tc256 256 1312 63 39 
1tc512 512 33264 110 141 
let128 128 672 28 25 
let256 256 1664 50 72 


let512 512 4032 100 143 


each connected component of a graph after the preprocessing. Columns “1,” 
“2” and “3” stand for the number of cliques in the partition, the solution 
found by the approximation algorithm and the value of the upper bound 
Oc(G), respectively. In Table 12.3 Column “a(G)” contains the independence 
number of the corresponding instance found by the exact algorithm; Column 
“Time” summarizes the total time needed to find a(G). 

Among the exact solutions presented in Table 12.3 only two were previ- 
ously known, for 2dc512 and let128. The rest were either unknown or were 
not proved to be exact. 


12.3 Lower Bounds for Codes Correcting One Error 
on the Z-Channel 


The error-correcting codes for the Z-channel have very important. practi- 
cal applications. The Z-channel shown in Fig. 12.1 is an asymmetric binary 
channel, in which the probability of transformation of 1 into 0 is p, and the 
probability of transformation of 0 into 1 is 0. 


Fig. 12.1 A scheme of the 
Z-channel 


= <2) 


The problem we are interested in is that of finding good estimates for the 
size of the largest codes correcting one error on the Z-channel. 

Let us introduce some background information related to asymmetric 
codes. 
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The asymmetric distance d4(z, y) between vectors x,y € B! is defined as 
follows [20]: 


da(x,y) = max{N(z,y), N(y,2)}, (12.12) 
where N(a,y) = |{¢ : (a; = 0) A (y; = 1)}]. It is related to the Hamming dis- 
tance dr 
(x,y) = ae |x; — yi| = N(x, y) + N(y, x) by the expression 

2da(x,y) = dia (x,y) + |w(x) — w(y)], (12.13) 


where w(x) = x x, = |{t : v; = 1}| is the weight of x. Let us define the 
minimum asymmetric distance A for a code C C B! as 


A=min {da(z,y)|z,y € C,a F y}. 


It was shown in [20] that a code C with the minimum asymmetric distance 
A can correct at most (A — 1) asymmetric errors (transitions of 1 to 0). In 
this subsection we present new lower bounds for codes with the minimum 
asymmetric distance A = 2. 

Let us define the graph G = (V (I), E(l)), where the set of vertices V(1) = 
B' consists of all binary vectors of length 1, and (v;,v;) € E(l) if and only 
if da(vi,vj) < A. Then the problem of finding the size of the code with 
minimal asymmetric distance A is reduced to the maximum independent set 
problem in this graph. Table 12.4 contains the lower bounds obtained using 
the algorithm presented above in this section (some of which were mentioned 
in Table 12.1). 


Table 12.4 Lower bounds obtained in: a [27]; b [6]; ¢ [7]; d [12]; e (this chapter) 


l Lower Bound Upper Bound 


4 4 4 
5 6a 6 
6 12b 12 
ie 18c 18 
8 36c 36 
9 62c 62 
10 112d 117 
11 198d 210 
12 379e (378d) 410 


12.3.1 The partitioning method 


The partitioning method [4, 12, 26] uses independent set partitions of the 
vertices of graph G in order to obtain a lower bound for the code size. An 
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independent set partition is a partition of vertices into independent sets such 
that each vertex belongs to exactly one independent set, that is, 


Vil) = Uh, Tris an independent set, I; (1; = 0,1 45. (12.14) 


i=l 


Recall that the problem of finding the smallest m for which a partition of 
the vertices into m disjoint independent sets exists is the well-known graph 
coloring problem. 

The independent set partition (12.14) can be identified by the vector 


IT(l) = (Nh, I2,..., Im). 
We associate the vector 
n(1) = (Li, |J2|, sey tal) 


which is called the index vector of partition I7(n), with I(l). Its norm is 
defined as 


n(1)- a(l) = pD [Z;|7. 


We will assume that |;| > |Io| >... > [Zm|. 

In terms of the codes, the independent set partition is a partition of words 
(binary vectors) into a set of codes, where each code corresponds to an inde- 
pendent set in the graph. 

Similarly, for the set of all binary vectors of weight w we can construct a 
graph G(l,w), in which the set of vertices is the set of the (() vectors, and 
two vertices are adjacent iff the Hamming distance between the corresponding 
vectors is less than 4. Then an independent set partition 


can be considered in which each independent set will correspond to a sub- 
code with minimum Hamming distance 4. The index vector and its norm are 
defined in the same way as for IT(n). 

By the direct product IT(l,) x IT(Iz,w) of a partition of asymmetric codes 
IT(l,) = (hh, Io,.-.,Im,) and a partition of constant weight codes IT(l2,w) = 


(7?’, 1y,..., 1) we will mean the set of vectors 


C={(uv): wel, vel’, 1<i<m}, 


where m = min{m 1, m2}. It appears that C is a code of length | = 1, + ls 
with minimum asymmetric distance 2, that is, a code correcting one error on 
the Z-channel of length | = 1, + ly [12]. 

In order to find a code C of length / and minimum asymmetric distance 2 
by the partitioning method, we can use the following construction procedure: 
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1. Choose 1; and ly such that 1, + lg = n. 
2. Choose € = 0 or l. 


3. Set 
[12/2] 


C=:||.) (m4) x ,2+9). (12.15) 


i=0 


12.3.2 The partitioning algorithm 


One of the popular heuristic approaches to the independent set partitioning 
(graph coloring) problem is the following. Suppose that a graph G = (V, £) 
is given. 


INPUT: G = (V, E); 
OUTPUT. fs bye i 


Oy 1y 
1. while G 4 0 


find a maximal independent set J; set J; = J; i=i+1; 
G = G— G(J), where G(J) is the subgraph induced by J; 


end 


In [22, 23] an improvement of this approach was proposed by finding at 
each step a specified number of maximal independent sets. Then a new graph 
G is constructed, in which a vertex corresponds to a maximal independent 
set, and two vertices are adjacent iff the corresponding independent sets have 
common vertices. In the graph G, a few maximal independent sets are found, 
and the best of them (say, the one with the least number of adjacent edges in 
the corresponding independent sets of G) is chosen. This approach is formally 
described in Figure 12.2. 


12.3.3 Improved lower bounds for code sizes 


The partitions obtained using the described partition algorithm are given in 
Tables 12.5 and 12.6. These partitions, together with the facts that [11] 


IT(1,0) consists of one (zero) codeword, 

IT(1,1) consists of | codes of size 1, 

IT(1,2) consists of | — 1 codes of size 1/2 for even l, 
the index vectors of I7(1,w) and IT(1,1 — w) are equal 
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INPUT: G = (V,E), N; 
OUTPUT: Ih, Ia, ..-, Tin: 


On 2 =0; 
1. while G49 
for j;=1toN 
Find a maximal independent set 155; 
if |IS;| = [I 5-1 break 
end 


Construct graph G; 
Find a maximal independent set MIS = {15;,,...,15;,} of 
G; 
Lis = TS;,, q=1,...,B; 
p 
G=G— U Gig): t= t+ 3; 
q=l 


end 


Fig. 12.2 Algorithm for finding independent set partitions 


Table 12.5 Partitions of asymmetric codes found 


iy # Partition Index Vector Norm m 
8 1 36,34, 34, 33, 30, 29, 26, 25, 9 7820 9 
9 1 62, 62, 62, 61, 58, 56, 53, 46, 29, 18, 5 27868 11 
2 62, 62, 62, 62, 58, 56, 53, 43, 32, 16, 6 27850 11 
3 62, 62, 62, 61, 58, 56, 52, 46, 31, 17, 5 27848 11 
4 62, 62, 62, 62, 58, 56, 52, 43, 33, 17, 5 27832 11 
5 62, 62, 62, 62, 58, 56, 54, 42, 31, 15, 8 27806 11 
6 62, 62, 62, 60, 57, 55, 52, 45, 31, 18, 8 27794 11 
7 62, 62, 62, 60, 58, 55, 51, 45, 37, 16, 4 27788 11 
8 62, 62, 62, 60, 58, 56, 53, 45, 32, 16, 6 27782 11 
9 62, 62, 62, 62, 58, 56, 52, 43, 32, 17, 6 27778 11 
10 62, 62, 62, 60, 58, 56, 53, 45, 31, 18, 5 27776 11 
11 62, 62, 62, 62, 58, 56, 50, 45, 32, 18, 5 27774 11 
12 62, 62, 62, 61, 58, 56, 51, 45, 30, 22, 3 27772 11 
13 62, 62, 62, 62, 58, 56, 50, 44, 34, 16, 6 27760 11 
14 62, 62, 62, 62, 58, 55, 51, 44, 32, 20, 4 27742 11 
10 112, 110, 110, 109, 105, 100, 99, 88, 75, 59, 37, 16, 4 97942 13 


1 

2 112, 110, 110, 109, 105, 101, 96, 87, 77, 60, 38, 15, 4 97850 13 
3 112, 110, 110, 108, 106, 99, 95, 89, 76, 60, 43, 15, 1 97842 13 
4 112, 110, 110, 108, 105, 100, 96, 88, 74, 65, 38, 17, 1 97828 13 
5 112, 110, 110, 108, 106, 103, 95, 85, 76, 60, 40, 15, 4 97720 13 
6 112, 110, 110, 108, 106, 101, 95, 87, 75, 61, 40, 17, 2 97678 13 
t 112, 110, 109, 108, 105, 101, 96, 86, 78, 63, 36, 17, 3 97674 13 


12 Estimating the size of correcting codes using extremal graph problems 241 


Table 12.6 Partitions of constant weight codes obtained in: a (this chapter); b [4]; ¢ [12] 


lg w + Partition Index-Vector Norm m 
10 4 la 30, 30, 30, 30, 26, 25, 22, 15, 2 5614 9 
12 #4 la 51, 51, 51, 51, 49, 48, 48, 42, 42, 37, 23, 2 22843 12 
12 #4 2a 51, 51, 51, 51, 49, 48, 48, 45, 39, 36, 22, 4 22755 12 
12 #4 3a 51, 51, 51, 51, 49, 48, 48, 45, 41, 32, 22,6 22663 12 
12 6 la 132, 132, 120, 120, 110, 94, 90, 76, 36, 14 99952 10 
14. #4 Ke 91, 91, 88, 87, 84, 82, 81, 79, 76, 73, 66, 54, 38, 11 78399 14 
14 #4 2c = 91, 90, 88, 85, 84, 83, 81, 79, 76, 72, 67, 59, 34, 11, 1 78305 15 
14. 6 1b 278, 273, 265, 257, 250, 231, 229, 219, 211, 672203 16 


203, 184, 156, 127, 81, 35, 4 


were used in (12.15), with « = 0, to obtain new lower bounds for the asym- 
metric codes presented in Table 12.7. To illustrate how the lower bounds 
were computed, let us show how the code for | = 18 was constructed. We use 
ly = 8 and ly = 10: 


II(8) x IT(10,0)| = 36-1 = 36; 

II(8) x IT(10,2)| = 256 - 5 = 1280; 

II(8) x IT(10, 4)| = 36 - 30 + 34 - 30 + 34 - 30 + 33 - 30 + 30 - 26 + 29 - 25 
+26 -22+4+25-154+9-2 = 6580; 

IT(10,6)| = |ZZ(8) x IT(10,4)| = 6580; 

IT(10,8)| = |JI(8) x IT(10, 2)| = 1280; 

|I7(8) x IZ(10, 10)| = |I7(8) x IZ(10,0)| = 36; 


The total is 2(36 + 1280 + 6580) = 15792 codewords. 


Table 12.7 New lower bounds. Previous lower bounds were found in: a [11]; b [12] 


Lower Bound 


I New Previous 
18 15792 15762a 
19 29478 29334b 
20 56196 56144b 
21 107862 107648b 
22 202130 201508b 


24 678860 678098b 


12.4 Conclusions 


In this chapter we have dealt with binary codes of given length correcting 
certain types of errors. For such codes, a graph can be constructed in which 
each vertex corresponds to a binary vector and the edges are built such 
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that each independent set corresponds to a correcting code. The problem 
of finding the largest code is thus reduced to the maximum independent set 
problem in the corresponding graph. For asymmetric codes, we also applied 
the partitioning method, which utilizes independent set partitions (or graph 
colorings) in order to obtain lower bounds for the maximum code sizes. 

We use efficient approaches to the maximum independent set and graph 
coloring problems to deal with the problem of estimating the largest code 
sizes. As a result, some improved lower bounds and exact solutions for the 
size of the largest error-correcting codes were obtained. 
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Chapter 13 


New perspectives on optimal 
transforms of random vectors 


P. G. Howlett, C. E. M. Pearce and A. P. Torokhti 


Abstract We present a new transform which is optimal over the class of 
transforms generated by second-degree polynomial operators. The transform 
is based on the solution of the best constrained approximation problem with 
the approximant formed by a polynomial operator. It is shown that the new 
transform has advantages over the Karhunen—Loeve transform, arguably the 
most popular transform, which is optimal over the class of linear transforms 
of fixed rank. We provide a strict justification of the technique, demonstrate 
its advantages and describe useful extensions and applications. 


Key words: Optimal transforms, singular-value decomposition, filtering, 
compression, tensors, random signals 
13.1 Introduction and statement of the problem 


Optimal transforms of random vectors have been applied succesfully to 
many problems in signal processing including, for example, the filtering and 
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compression of random signals and the classification and clustering of signals 
[4, 8, 18]. 

Known transforms are mainly based on linear models. The Karhunen— 
Loéve transform is perhaps the most popular linear transform and achieves 
the smallest associated error of all linear transforms of fixed rank. Recently 
Hua and Liu [8] generalized it to the case where no relationship is assumed 
between a stochastic signal and noise. 

Although the associated error cannot be reduced by the use of any other 
linear transform of the same rank, the performance of this transform is still 
unsatisfactory in many applications. See the simulations in Section 13.7 in 
this connection. In this chapter we present a new nonlinear transform with 
a substantially better performance than that of the generalized Karhunen— 
Loéve transform (GKLT) of [8]. In particular, we show that for the same 
rank our transform possesses a much smaller associated error. Our method 
is based on the best constrained approximation of a stochastic signal by an 
approximant generated by a second-degree polynomial operator. The tech- 
nique is based on the primary concept presented in [14]-[16]. We begin with 
a rigorous statement of the problem. 

Let (2, 5’, ) be a probability space, with 2 the set of outcomes, »’ the 
minimal o-field of measurable subsets of 2 and yw: +> [0,1] an associated 
probability measure on XY. Suppose that x € L?(2,R™) and y € L?(2,R"”) 
are random vectors with realizations 7(w) € R™ and y(w) € R”. We interpret 
x as a given “idealized” signal (without any distortion) and y as an observed 
signal. In particular, y can be interpreted as x contaminated with noise so that 
no specific relationships between signal and noise are assumed. For instance, 
noise can be additive, multiplicative or a combination of the two. 

Each operator F : R™ — R” defines an associated operator Fr 
1?(2,R™) = L7(2,R”) via 


Fr((x)|(w) = Fla(w)] foreach we. (13.1) 


It is customary to write F(x) rather than F(x), since we have [F'(x)|(w)= 
F'|x(w)] for each w € 92. It is also convenient to write x for z(w), y for y(w), 
etc. 

Let T : R” — R™ be the operator associated with the map Tr : 
L?(2,R") = L?(2,R™) by an equation similar to (13.1). Suppose T is 
given by 


T(y) =A+)_ Byz;, (13.2) 
j=0 
where AE R™, By ER™” (jf =0,1,...2), 2 = 9, Y= (Y1,--- Yn)? E R” 


and z; = yjy for 7 = 1,...n. Then the operator T is completely defined by 
A and B, for j = 0,1,...n. 
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The problem is to find a vector A° and matrices B} such that 


0 pO 0) _ . 
J(A’, Bo,...B,) = ge es J(A, Bo,...,Bn), (13.3) 
subject to 
rank[A Bo By...Bn] =r (13.4) 


with r < m. Here 


2 


J(A, Bo,..-,Bn) = E |\]z@— | A+ ¥— Bj2; (13.5) 
j=0 
with E the expectation operator, || - || the Frobenius norm, g = n?+n+1 


and [A Bo By Set Bn] ERX, 


13.2 Motivation of the statement of the problem 


Equations (13.3)—(13.5) represent the best constrained approximation prob- 
lem. It is well known that a nonlinear approximant normally possesses a 
smaller associated error than that of a linear approximation. Therefore it is 
natural to seek a suitable nonlinear form of approximant. 

Let us consider the nonlinear operator T given by 


where C: R” x R” > R™ is a bilinear operator, that is, (y,y) ER" x R”. 
The operator C is a (mxnxn)—-—tensor, C = {cj3,} Ee R™*"*”. Therefore 
the vector C(y, y) can be presented as the product of a tensor C' and vector 
y and also as a product of the matrix Cy with the vector y. As a result we 
have 


Cty, y) = (Cy)y = Biyry +... + Bryn; 


where By = {cip}Ee R™*", ..., Bn = {Cink} © R™*”. Alternatively, By = 
{cj} € RX", ..., Bn = {Cijgn} € R™*”. Hence (13.6) coincides with 
(13.2). 


In the following four sections we show that the transform T° produced by 
the nonlinear approximant 


T°(y) =A° +S) Bix; (13.7) 
j=0 


possesses a much smaller associated error than that of the GKLT. We then 
proceed to address applications and simulations. 
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13.3 Preliminaries 

For any u, w € R”, we write Ey, = Eluw™] and Ey, = Euy — E{ulE[w7). 

The symbol M? denotes the Moore-Penrose pseudo-inverse of a matrix M 

(see [2]). 

Lemma 13.3.1 We have the relations 

Cue era Cake ae ea ee Ee Oo et es 

(13.8) 

Lemma 13.3.2 Let z= [27 --- zTJTeR™, 

Deter we bie and Gabo E256 pe ye 


Then 
GD'D =G. 


The proofs are similar to those of Lemmas 2 and 3 in [14]. 
For the following result it is convenient to write s = [1 y? 27]? € RY. 
Lemma 13.3.3 Let 
Py =1- PpEly] — PisE[z], Pi2o= Py, Pis = —Ely”]Po3 — Elz") Pas, 
Po, = —Px2 Ely] — Po3E[2], Poo = El, — PozEzyE',, Pos = Ph, 


P31 = —P33E[z] — PsoE[y|, P32 = —P33€zyEl,, P33 = DI. 


yy? 
Then 
Pir Pio Pig 
Et, = | Po, Pop Pog | . (13.9) 
Pi3 P32 P33 
Proof. Let 


1 
t= | iF Si, =1-Si2Elyl, 


Sig =—Ely")S22, S21 = Siz, S22 = Ely: 


First we show that 


+ — | Str Ste 
El, = EB a (13.10) 
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We have Ou Q 
Sii Siz 11 Qi2 
E EB — 
A ie | 7 Ee | 


where 
Qu = Si t+ Ely") S21 + Si2E[y] + Ely" |S22E[y] = 1, 


Qi2 = SuEly"] + Ely" |S. Ely] + Si2Eyy + Ely" |S22Eyy = Ely"), 
Qa = Ely)Su1 + EyySoi + Ely|Si2E[y] + EyyS22Ely] = Ely] 


and 


Qoo = Ely) S11 E[y7] + Eyy Sa Ely" | + Ely|Si2Eyy + EyySe2Eyy = Eyy. 


21 922 
Penrose inverse of Ey, to be given by (13.10) is satisfied. The remain- 
ing Moore—Penrose conditions for El, are also easily verified, and therefore 
(13.10) is established. 
Next, let 


Hence E4; Ee | Eu = En, that is, the first condition for the Moore— 


Ry = El,— ReExEl, Rw = Rh, (13.11) 
Ry =—R2ExnE}, and Ry = Di,, (13.12) 


where Dz; = Ezz — ExE\,Eiz =D. 
Arguing much as above, we have by Lemmas 13.3.1 and 13.3.2 that 


Et,= oe al ; (13.13) 


Relation (13.9) follows from (13.13) by virtue of (13.10)—(13.12). 


13.4 Main results 


We denote by UV" the singular-value decomposition of Eys( Edt *)t that is, 
ULV? = Bpe(EL*)I, (13.14) 
where 
U Sti pcrcytig) CR and Via (ty. iyvq) ERO 


are orthogonal matrices and 
&) = diag(o1,...,0q) € R™™4 


is a diagonal matrix with 0) >--- > 0, > 0 and op41 =-:: = oq = 0. Put 


U, = (uy,..-,Up), Vp =(v1,...,Up), Ly = diag(oy,...,0,) 
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and define 
©. 2 Oe) SU, Ve (13.15) 


Suppose 
@=([A Bo... B,) eR”! 


and let &(:,7: 7) be the matrix formed by the 7 — 7 + 1 sequential columns 
of & beginning with column 7. 

The optimal transform T°, introduced by (13.7), is defined by the following 
theorem. 


Theorem 13.4.1 The solution to problem (13.3) is given by 
0 On. Fs 0 _ go. ; -e 
Av=@(:,1:1), Be=OC,jn+2:jntn+ I), (13.16) 
for 3 =0,1,...n, where 
i OE)! + M,[I- Sasa uh 


with I the identity matric and M, € R™*4 any matrix such that rank 6° 
<r<m. 


Proof. We have 
J(A, Bo, veiesg Bn) = E||\lx _ s|\?). 


By Lemma 13.3.1, 


J(A, Bo, Paes Bn) 


= tr {Ey — EysEl,Esx} + tr {(® — ExsE!,)Ess(® — ExsE!)* } 
2 


= tr { Ey, — ExsE!,Esx} + (13.17) 


|(@- Ens EJ EM? 


The minimum of this functional subject to constraint (13.3) is achieved if 
GE1/? — @, (13.18) 
(see [5]). Here we have used 
ELEL? = (ECVE) ESC)? = (EP)! = (EL). 


The necessary and sufficient condition (see [2]) for (13.18) to have a solution 
is readily verified to hold and provides the solution = 6°. The theorem is 
proved. 


Remark 1. The proof above is based on Lemma 13.3.1. The first equation in 
(13.8) has been presented in [8] but without proof. 


Theorem 13.4.2 Let 


A = ||(Exe — ExyE}yEye)(DI)7|?. 
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The error associated with the transform T° is 
k 
Elz — T°(y)|7] = tr{Exa} + | of — [lEey(Efy) PIP — A. 
i=rt+l 

Proof. By Lemma 13.3.3, it follows from (13.17) and (13.18) 

E||lax — T°(y)I|I7] = fi is = Biggeh Pugh oy |ULVE _ @,,||? 

= |Z Beek 

where 


k 
Te 
VZV? -6,|? = S° oF 
j=rt+l 


(see [5]). This proves the theorem. 


13.5 Comparison of the transform T° and the GKLT 


The GKLT is a particular case of our transform T° with A° = O and each 
BY = O in (13.16), where O is the corresponding zero vector or zero matrix. 
To compare the transform T° with the GKLT, we put A° = O in (13.16). 
Then the vector s in (13.14) can be written as s = § = [y? z7]7. We denote 
by o; the eigenvalues in (13.14) for s = 8 and by T° the transform which 
follows from (13.7) and (13.16) with A° = O and s = §. We denote the GKLT 
by HA. 


Theorem 13.5.1 Let 31,..., 0, be the nonzero singular values of the matrix 
Exy(Byj’)t, rank H =p <1 and D = Bz, — Ex 4, Eyz. If 


k 2 I 
YGF < (Bes - Bay Eh Bye (DY? | + SF 0, (13.19) 
g=rt+l t=pt+1 


then the error associated with the transform T° is less than that associated 
with H, that is, 


B|l|e- 7] < Ble — Hull? 


Proof. It is easily shown that 


1 
Ella — Hyll]? = te{ Ene — BsyEhy Bye} + > 92. 
t=pt+1 
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Hence 


E|lle — Hy|l]? B |< Oe 


k 


l 
2 
= | Bee — Bay Eh, By2)(DIY>| ar ee oe — S- a, 
t=ptl g=rt+l 


giving the desired result. 


Condition (13.19) is not restrictive and is normally satisfied in practice. 
In this connection, see the results of the simulations in Section 13.8. 


13.6 Solution of the unconstrained minimization 
problem (13.3) 


We now address the solution of the minimization problem (13.3) without the 
constraint (13.4). This is important in its own right. The solution is a special 
form of the transform T° and represents a model of the optimal nonlinear 
filter with 2 an actual signal and y the observed data. 


Let 
Diz... Din 
p- Dai... Dan and G = [GiGo...Gnl, 
Dnt. Dan 
where 


Dig = Eng; ~Exybigeuee eR” atid Op= Egg —Exyll gluse RO 


for i,j =1,...,n. We denote a solution to the unconstrained problem (13.3) 
using the same symbols as before, that is, with A° and B? for j = 0,--- ,n. 


Theorem 13.6.1 The solution to the problem (13.3) is given by 


A® = Efa] — BO Ely] — )~ BRE[zx1, (13.20) 
k=1 
BS = (Exy— > BeEcy)Ely + Mall — EF (EU? 11, (13.21) 
k=1 


[B?BS...B°]) = GPt+ Mr — PV2(pY/)i), (13.22) 
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2 
where B®, BS,..., BIE R™*", My € R™*” and Mz € R™*” are arbitrary 
matrices. 


Theorem 13.6.2 Let 


Q11 --- Qin 
pt Qa... Qan 
Ont oa Qnn 
where Qi; € R"*” fori,j =1,...,n. The error associated with the transform 


T defined by 


TY (y) = A+ D7 Bi z;, 
j=0 


with A® and B} given by (13.20)-(13.22), is 


Elle — TO (y) 7] = {Ene} — ay (Eby)? — > MGW? I? 


= S- tr{GjQjnG; }- 
j,k=1,...,7 
j#k 
The proofs of both theorems are similar to those of Theorems 1 and 2 in 
[14]. 
It follows from Theorem 13.6.2 that the filter T@) has a much smaller 
associated error than the error 


Ella — H (y)||?] = tr{ Ezz} — ||Exy(Et,)!/? ||? 


associated with the optimal linear filter H“)) = E,,E}), in [8]. 


13.7 Applications and further modifications 
and extensions 


Applications of our technique are abundant and include, for example, simul- 
taneous filtering and compression of noisy stochastic signals, feature selection 
in pattern recognition, blind channel equalization and the optimal rejection 
of colored noise in some neural systems. For the background to these appli- 
cations see [1, 4, 8, 18, 19]. 

The efficiency of a fixed-rank transform is mainly characterized by two 
parameters; the compression ratio (see {8]) and the accuracy of signal restora- 
tion. The signal compression is realized through the following device. Let p 
be the rank of the transform H. Then H can be represented in the form 
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HT = H,Hpo, where the matrix Hz € R?*” relates to compression of the sig- 
nal and the matrix H; € R™*? to its reconstruction. The compression ratio 
of the transform is given by cy = p/m. Similarly the transform T° can be 
represented in the form 6° = C,C2, where, for example, C,; = U, € R™*" 
and Cy = 5,V,7(El/?)t © R™4, so that the matrix C2 is associated with 
compression and the matrix C; with signal reconstruction. The compression 
ratio of the transform T° is given by er = r/m. 

Modifications of the method are motivated mostly by a desire to reduce the 
computation entailed in the estimation of the singular-value decomposition 
of Bad Bay This can be done by exploiting the representation (13.20) 
(13.22) in such a way that the matrices B),--- , B® in (13.22) are estimated 
by a scheme similar to the Gaussian elimination scheme in linear algebra. A 
rank restriction can then be imposed on the matrices B),--- , B® that will 
bring about reduction of the computational work in finding certain pseudo- 
inverse matrices. 

Extensions of the technique can be made in the following directions. First, 
the method can be combined with a special iterative procedure to improve the 
associated accuracy of the signal estimation. Secondly, an attractive extension 
may be based on the representation of the operator T (13.6) in the form 


T(y) = Ao + Ary + Ao(y,y) +--+ Agly,---5¥), 


where A; : (R")* = R™ is a k-linear operator. 

Thirdly, a natural extension is to apply the technique to the optimal syn- 
thesis of nonlinear sytems. Background material can be found in papers by 
Sandberg (see, for example, [11, 12]) and also [6, 7, 13, 16]. 


13.8 Simulations 


The aim of our simulations is to demonstrate the advantages of T° over the 
GKLT H. To this end, we use the standard digitized image “Lena” presented 
by a 256 x 256 matrix X. 

To compare the transforms T° and H for different noisy signals, we parti- 
tion the matrix X into 128 submatrices X;; € R'°*** with i =1,...,16 and 
j =1,...,8 and treat each X;; as a set of 32 realizations of a random vector 
so that a column of X;; represents the vector realization. 

Observed data have been simulated in the form 

Yij =10* RG). * Xij + 500*# RO, 
with i =1,...,16 and 7 = 1,...,8, where each Re is a matrix with entries 
uniformly distributed over the interval (0,1) and each Ro is a matrix with 
normally distributed entries with mean 0 and variance 1. The symbol .« 
signifies Hadamard matrix multiplication. 
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The transforms T° and H have been applied to each pair X;;, Yi; with the 
same rank r = 8, that is, with the same compression ratio. The correspond- 
ing covariance matrices have been estimated from the samples X;; and Y;;. 
Special methods for their estimation can be found, for example, in [3, 9, 10] 
and [17]. 

Table 13.1 represents the values of ratios 


pig = ||Xag — H(Vig) |? Xz — TVs) ||? 


for each i = 1,...,16 and j = 1,...,8, where ||Xi; — H(Yj;)||? and 
Xi; — T°(Yi,;)||? are the errors associated with the transforms H and 
T°, respectively. The value p;; is placed in the cell situated in row 7 and 
column 7. 


Table 13.1 Ratios p;; of the error associated with the GKLT 4 to that of the transform 
T° with the same compression ratios 


lillgr 1 2 3 4 5 6 7 8 
1 5268.3 3880.6 1864.5 1094.7 2605.4 2878.0 4591.6 1052.7 
2 2168.4 995.1 1499.7 338.6 1015.1 3324.0 2440.5 336.1 
3 2269.3 803.5 1584 1364 66.7 2545.4 1227.1 326.6 
4 1394.3 716.2 173.7 62.9 451.4 721.6 227.8 691.6 
5 3352.4 1970.1 98.9 192.8 390.0 92.8 680.4 3196.8 
6 1781.5 758.6 93.6 79.3 59.8 223.2 110.5 2580.8 
7 2077.4 1526.0 «67.4 30.3 172.5 70.3. 1024.4 4749.3 
8 3137.2 901.2 27.1 38.5 475.3 445.6 1363.2 2917.5 
9 2313.2 117.0 18.0 39.3 180.6 251.0 1500.4 2074.2 
10 1476.0 31.5 35.7 119.3 859.3. 883.5 2843.1 3270.6 
ll 1836.7 35.3 36.4 1015.5 460.6 487.0 2843.1 8902.3 
12 1808.5 74.5 38.2 419.0 428.0 387.2 2616.9 8895.3 
13 1849.1 17.6 30.3 4924 1175.5 135.8 1441.9 1649.2 
14 2123.6 54.9 38.6 302.0 1310.5 2193.8 2681.5 1347.9 
15 1295.1 136.3 31.8 711d 2561.7 5999.2 550.7 996.0 
16 21255 1149 31.5 732.3 2258.2 5999.2 550.7 427.1 


Inspection of Table 13.1 shows that, for the same compression ratio, the 
transform T° has associated error varying from one part in 17.6 to one part 
in 8,895.3 to that of the transform H. 

We also applied our filter 7) (constructed from Theorem 13.6.1) and the 
optimal linear filter H) = E,,E},, to the same signals and data as above, 
that is, to each pair X;;, Yi; with i= 1,--- ,16 and 7 = 1,--- ,8. 

The errors associated with filters T” and H™) are 


|X — Xp]? =1.4x 107 and ||X — Xy||\? = 3.9 x 10’, 


where the matrices X7 and Xy have been constructed from the submatrices 
Xj € RX? and Xi; € R'*%*? correspondingly, that is, Xr = {Xrij} € 
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R256 256 and Xy = {Xnij} € R°S**6 with Xp;; = T(Y;i;) the estimate 
of Xj; by the filter T") and Xyij = HY;; that of Xj; by the filter H™. 
The error produced by the filter H“ is 2.7 x 10!° times greater than that of 
the filter T™. 

Figures 13.1(c) and (d) represent images reconstructed after filtering and 
compression of the noisy image in Figure 13.1(b) by the transforms H and 


100 150 100 150 
(a) Given signals. (b) Observed signals. 
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(c) Reconstruction after filtering and compression (d) Reconstruction after filtering and compression 
by the GKLT. by our transform with the same rank as that of 
the GKLT. 
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(ec) Estimates by the filter H. (£) Estimates by the filter TO. 


Fig. 13.1 Illustration of the performance of our method. 
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(b) Estimates of the 244th column in the matrix X. 


Fig. 13.2 Typical examples of a column reconstruction in the matrix X (image “Lena” ) 
after filtering and compression of the observed noisy image (Figure 13.1b) by trans- 
forms H (line with circles) and T° (solid line) of the same rank. In both subfigures, the 
plot of the column (solid line) virtually coincides with the plot of the estimate by the 
transform T°. 
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T° (which have been applied to each of the subimages X;;, Yi; with the same 
compression ratio. 

Figures 13.1(e) and (f) represent estimates of the noisy image in Figure 
13.1(b) by filters H and T™, respectively. 

To illustrate the simulation results in a different way, we present typical 
examples of the plots of a column estimate in matrix X by transforms H 
and T°. Note that differences of the estimate by T° from the column plot are 
almost invisible. 

Table 13.1 and Figures 13.1 and 13.2 demonstrate the advantages of our 
technique. 


13.9 Conclusion 


The recently discovered generalization [8] of the Karhunen—Loéve transform 
(GKLT) is the best linear transform of fixed rank. In this chapter we have 
proposed and justified a new nonlinear transform which possesses substan- 
tially smaller associated error than that of the GKLT of the same rank. 

A number of potential applications, modifications and extensions have 
been described. Numerical simulations demonstrate the clear advantages of 
our technique. 
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Chapter 14 


Optimal capacity assignment 
in general queueing networks 


P. K. Pollett 


Abstract We consider the problem of how best to assign the service capacity 
in a queueing network in order to minimize the expected delay under a cost 
constraint. We study systems with several types of customers, general service 
time distributions, stochastic or deterministic routing, and a variety of ser- 
vice regimes. For such networks there are typically no analytical formulae for 
the waiting-time distributions. Thus we shall approach the optimal alloca- 
tion problem using an approximation technique: specifically, the residual-life 
approximation for the distribution of queueing times. This work generalizes 
results of Kleinrock, who studied networks with exponentially distributed 
service times. We illustrate our results with reference to data networks. 


Key words: Capacity assignment, queueing network,  residual-life 
approximation 


14.1 Introduction 


Since their inception, queueing network models have been used to study a 
wide variety of complex stochastic systems involving the flow and interaction 
of individual items: for example, “job shops,” where manufactured items are 
fashioned by various machines in turn [7]; the provision of spare parts for 
collections of machines [17]; mining operations, where coal faces are worked 
in turn by a number of specialized machines [12]; and delay networks, where 
packets of data are stored and then transmitted along the communications 
links that make up the network [18, 1]. For some excellent recent expositions, 
which describe these and other instances where queueing networks have been 
applied, see [2, 6] and the important text by Serfozo [16]. 
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In each of the above-mentioned systems it is important to be able to de- 
termine how best to assign service capacity so as to optimize various perfor- 
mance measures, such as the expected delay or the expected number of items 
(customers) in the network. We shall study this problem in greater generality 
than has previously been considered. We allow different types of customers, 
general service time distributions, stochastic or deterministic routing, and a 
variety of service regimes. The basic model is that of Kelly [8], but we do not 
assume that the network has the simplifying feature of quasi-reversibility [9]. 


14.2 The model 


We shall suppose that there are J queues, labeled 7 = 1,2,...,J. Cus- 
tomers enter the network from external sources according to independent 
Poisson streams, with type u customers arriving at rate v,, (customers per 
second). Service times at queue j are assumed to be mutually independent, 
with an arbitrary distribution Fj(x) that has mean 1/j; (units of service) 
and variance oF. For simplicity we shall assume that each queue operates 
under the usual first-come-first-served (FCFS) discipline and that a total ef- 
fort (or capacity) of @; (units per second) is assigned to queue j. We shall 
explain later how our results can be extended to deal with other queueing 
disciplines. 

We shall allow for two possible routing procedures: fixed routing, where 
there is a unique route specified for each customer type, and random alterna- 
tive routing, where one of a number of possible routes is chosen at random. 
(We do not allow for adaptive or dynamic routing, where routing decisions 
are made on the basis of the observed traffic flow.) For fixed routing we define 
R(u) to be the (unique) ordered list of queues visited by type u customers. In 


particular, let R(u) = {ru(1),..-, Tu(Su)}, where s,, is the number of queues 
visited by a type u customer and r,,(s) is the queue visited at stage s along 
its route (r,(s), s = 1,2,...,8,, are assumed to be distinct). It is perhaps 


surprising that random alternative routing can be accommodated within the 
framework of fixed routing (see Exercise 3.1.2 of [10]). If there are several 
alternative routes for a given type u, then one simply provides a finer type 
classification for customers using these routes. We label the alternative routes 


as (u,7), i = 1,2,...,N(u), where N(u) is the number of alternative routes 
for type u customers, and we replace R(u) by R(u,i) = {rui(1),---, rui(Sua) }, 
for i=1,2,...,N(u), where now r,,;(s) is the queue visited at stage s along 


alternative route 7 and s,; is the number of stages. We then replace ™, by 
Vui = Vudui, Where qu; is the probability that alternative route 7 is cho- 
sen. Clearly yy, = pres ) Vyi, and so the effect is to thin the Poisson stream 
of arrivals of type wu into a collection of independent Poisson streams, one 
for each type (u,7). We should think of customers as being identified by 
their type, whether this be simply wu for fixed routing, or the finer classi- 
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fication (u,7) for alternative routing. For convenience, let us denote by T 
the set of all types, and suppose that, for each ¢t in T, customers of type t 
arrive according to a Poisson stream with rate 4 and traverse the route 
R(t) = {r:(1),.-..,re(sz)}, a collection of s, distinct queues. This is the net- 
work of queues with customers of different types described in [8]. If all service 
times have a common exponential distribution with mean 1/j (and hence 
4; = ws), the model is analytically tractable. In equilibrium the queues be- 
have independently: indeed, as if they were isolated, each with independent 
Poisson arrival streams (independent among types). For example, if we let 
a,(t,s) = 4% when r;(s) = j, and a,(t,s) = 0 otherwise, so that the arrival 
rate at queue j is given by aj = Yep 0o8, a(t, 8), and the demand (in 
units per second) by a; = a;/, then, provided the system is stable (a; < ¢; 
for each 7), the expected number of customers at queue j is Aj = a;/(¢; —a;) 
and the expected delay is W; = n;/a; = 1/(ub; — a;); for further details, 
see Section 3.1 of [10]. 


14.3 The residual-life approximation 


Under our assumption that service times have arbitrary distributions, the 
model is rendered intractable. In particular, there are no analytical formu- 
lae for the delay distributions. We shall therefore adopt one of the many 
approximation techniques. Consider a particular queue j and let Q;(x) be 
the distribution function of the queueing time, that is, the period of time a 
customer spends at queue j before its service begins. The residual-life approx- 
imation, developed by the author [14], provides an accurate approximation 


for Q;(«): . 
Qy(2) 2 5 Pring = HEP(a), (14.1) 


n=0 


where Gj(x) = py {5’"(1 — Fj(y)) dy and G?(x) denotes the n-fold convo- 
lution of G;(x). The distribution of the number of customers n,; at queue J, 
which appears in (14.1), is that of a corresponding quasi-reversible net- 
work [10]: specifically, a network of symmetric queues obtained by imposing 
a symmetry condition at each queue 7. The term residual-life approximation 
comes from renewal theory; G(x) is the residual-life distribution correspond- 
ing to the (lifetime) distribution F)(x/¢;). 

One immediate consequence of (14.1) is that the expected queueing time 
Q; is approximated by Q, ~ 7;(1+ u507)/(24;¢;), where 7; is the expected 
number of customers at queue 7 in the corresponding quasi-reversible net- 
work. Hence the expected delay at queue 7 is approximated as follows: 


=e) 1 l+Mje; 
Wj; a t Digaek NM; . 
HiP5 M5; 
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Under the residual-life approximation, it is only n; which changes when the 
service discipline is altered. In the current context, the FCFS discipline, which 
is assumed to be in operation everywhere in the network, is replaced by a 
preemptive-resume last-come-first-served discipline, giving in; = a;/(; —4a;) 
with a; = a;/p;, for each j, and hence 


__ 1 1+ p50; 
W;~ Pen a ( es ) (14.2) 
M505 2450; \ Mio — 2; 


Simulation results presented in [14] justify the approximation by assessing 
its accuracy under a variety of conditions. Even for relatively small networks 
with generous mixing of traffic, it is accurate, and the accuracy improves 
as the size and complexity of the network increases. (The approximation is 
very accurate in the tails of the queueing time distributions and so it allows 
an accurate prediction to be made of the likelihood of extreme queueing 
times.) For moderately large networks the approximation becomes worse as 
the coefficient of variation j;0; of the service-time distribution at queue 7 
deviates markedly from 1, the value obtained in the exponential case. 


14.4 Optimal allocation of effort 


We now turn our attention to the problem of how best to apportion resources 
so that the expected network delay, or equivalently (by Little’s theorem) 
the expected number of customers in the network, is minimized. We shall 
suppose that there is some overall network budget F’ (dollars) which cannot 
be exceeded, and that the cost of operating queue 7 is a function f; of its 
capacity. Suppose that the cost of operating queue 7 is proportional to @;, 
that is, f;(¢;) = f;%; (the units of f; are dollars per unit of capacity, or 
dollar-seconds per unit of service). Thus we should choose the capacities 
subject to the cost constraint 


J 
> fi) =F. (14.3) 
j=l 


We shall suppose that the average delay of customers at queue 7 is adequately 
approximated by (14.2). Using Little’s theorem, we obtain an approximate 
expression for the mean number m of customers in the network. This is 


J 2g J 
___o5(1 + 4503) E a;(1 +c) 
ma Day ma : 21505 (Mj h5 — |= 2% | 265(0; — a;) 
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where c; = 50% is the squared coefficient of variation of the service time 


distribution F';(x). We seek to minimize m over ¢1,...,¢7 subject to (14.3). 
To this end, we introduce a Lagrange multiplier 1/A?; our problem then 
becomes one of minimizing 


1 J 
Lr... X*) = M+ yy | D fidi-F 
j=l 


Setting OL /0¢,; = 0 for fixed j yields a quartic polynomial equation in ¢;: 


2fj;05 — 4a; fj)03 + 2a; (a; fj; — 7)? — 2e;a7 7d; + a5” =0, 


where €; = c; — 1, and our immediate task is to find solutions such that 
0; > a, (recall that this latter condition is required for stability). The task is 
simplified by observing that the transformation ¢, f;/F — ¢;, aj f;/F — aj, 
d?/F — 2, reduces the problem to one with unit costs f; = F = 1, whence 
the above polynomial equation becomes 


2¢5 - 4a;° + 2a;(a; — dr?) 65 - 2eja7 rd; + €ja3 =0, (14.4) 


and the constraint becomes 


dotdot--:+¢7=1. (14.5) 


It is easy to verify that, if service times are exponentially distributed (e; = 0 
for each 7), there is a unique solution to (14.4) on (a;,00), given by ¢; = 
a;+|A|,/a;. Upon application of the constraint (14.5) we arrive at the optimal 


capacity assignment $j; = aj+,/aj(1—22_, ax) /(22_, Var), for unit costs. 
In the case of general costs this becomes 


= a; a = Fas 
oj i+ z(* Sh ) wa Ys re 


after applying the transformation. This is a result obtained by Kleinrock [11] 
(see also [10]): the allocation proceeds by first assigning enough capacity to 
meet the demand aj, at each queue j, and then allocating a proportion of the 
affordable excess capacity, (fF — ae fran) /f; (that which could be afforded 
to queue j), in proportion to the square root of the cost f;a; of meeting that 
demand. In the case where some or all of the €;, 7 = 1,2,..., J, deviate from 
zero, (14.4) is difficult to solve analytically. We shall adopt a perturbation 
technique, assuming that the Lagrange multiplier and the optimal allocation 
take the following forms: 
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J 
N= Ao + So Arwer + O(€?), (14.6) 
k=1 
J 
5 = bo + D> bajner + O(€?), FH... 4, (14.7) 
k=1 


where O(e€?) denotes terms of order €;¢x. The zero-th order terms come from 
Kleinrock’s solution: specifically, do; = a; + Ao\/aj, j = 1,..., J, where Ao = 
(1-07, ax) /(2f_, Vax). On substituting (14.6) and (14.7) into (14.4) we 
obtain an expression for $,;, in terms of Ayz, which in turn is calculated 
using the constraint (14.5) and by setting €, = 6,; (the Kronecker delta). We 
find that the optimal allocation, to first order, is 


a5 
5 = a5 + AoV/Gj beer + | 1 bje;, (14.8) 
pe es sa ( Sey 


where by, = Ln ree (ax + 2A0V/ak)/(ak + doar)”. For most practical appli- 
cations, heher order solutions are required. To achieve this we can simplify 
matters by using a single perturbation € = max)<;<, |€;|. For each j we 
define a quantity 3; = ¢€;/e and write ¢; and \ as power series in e: 


YEE GET tyes gS ag (14.9) 
n=0 n=0 


Substituting as before into (14.4), and using (14.5), gives rise to an iterative 
scheme, details of which can be found in [13]. The first-order approximation is 
useful, nonetheless, in dealing with networks whose service-time distributions 
are all ‘close’ to exponential in the sense that their coefficients of variation do 
not differ significantly from 1. It is also useful in providing some insight into 
how the allocation varies as €;, for fixed j, varies. Let ¢/, 1 = 1,2,...,J, be 
the new optimal allocation obtained after incrementing €; by a small quantity 
6 > 0. We find that to first order in 6 


Soe oF ee 
oe (2 ai) ‘ 


ee eee cee ee ee 


Die VaR 
Thus, if the coefficient of variation of the service-time distribution at a given 
queue j is increased (respectively decreased) by a small quantity 6, then there 
is an increase (respectively decrease) in the optimal allocation at queue j 
which is proportional to 6. All other queues experience a complementary 
decrease (respectively increase) in their allocations and the resulting deficit 
is reallocated in proportion to the square root of the demand. 
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In [13] empirical estimates were obtained for the radii of convergence of 
the power series (14.9) for the optimal allocation. In all cases considered 
there, the closest pole to the origin was on the negative real axis outside 
the physical limits for ¢;, which are of course —1 < €; < oo. The pertur- 
bation technique is therefore useful for networks whose service-time distri- 
butions are, for example, Erlang (gamma) (—1 < ¢€; < 0) or mixtures of 
exponential distributions (0 < €; < oo) with not too large a coefficient of 
variation. 


14.5 Extensions 


So far we have assumed that the capacity does not depend on the state of 
the queue (as a consequence of the FCFS discipline) and that the cost of 
operating a queue is a linear function of its capacity. Let us briefly consider 
some other possibilities. Let @;(n) be the effort assigned to queue j when there 
are n customers present. If, for example, ¢;(n) = n¢;/(n +7 — 1), where 7 
is a positive constant, the zero-th order allocation, optimal under (14.3), is 
precisely the same as before (the case 7 = 1). For values of 7 greater than 1 
the capacity increases as the number of customers at queue j increases and 
levels off at a constant value ¢; as the number becomes large. If we allow 7 
to depend on j we get a similar allocation but with the factor 


WG peice, Wat 
j placed by = 
eat Vie ak pai VE RNK aR 


(see Exercise 4.1.6 of [10]). The higher-order analysis is very nearly the same 
as before. The factor 1+ c; is replaced by 7;(1+ c;); for the sake of brevity, 
we shall omit the details. 

As another example, suppose that the capacity function is linear, that is, 
¢;(n) = @;n, and that service times are exponentially distributed. In this 
case, the total number of customers in the system has a Poisson distribu- 
tion with mean (ay /¢;) and it is elementary to show that the optimal 
allocation subject to (14.3) is given by 


Tia, 
= vii op T= Laie. 
fj ear VS ete 
It is interesting to note that we get a proportional allocation, ¢;/¢% = a;/ak, 


in this case if (14.3) is replaced by ee log ¢; = 1 (see Exercise 4.1.7 of [10]). 
More generally, we might use the constraint 


; 


J 
S— flog (9503) = F 


j=1 
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to account for ‘decreasing costs’: costs become less with each increase in 
capacity. Under this constraint, the optimal allocation is ¢; = Aa,;/f;, where 


c ys 
log A = (#-¥ foto / (>: i) 
k=l 


k=1 


14.6 Data networks 


One of the most interesting and useful applications of queueing networks is 
in the area of telecommunications, where they are used to model (among 
other things) data networks. In contrast to circuit-switched networks (see for 
example [15]), where one or more circuits are held simultaneously on several 
links connecting a source and destination node, only one link is used at any 
time by a given transmission in a data network (message- or packet-switched 
network); a transmission is received in its entirety at a given node before 
being transmitted along the next link in its path through the network. If 
the link is at full capacity, packets are stored in a buffer until the link be- 
comes available for use. Thus the network can be modeled as a queueing 
network: the queues are the communications links and the customers are the 
messages. The most important measure of performance of a data network 
is the total delay, the time it takes for a message to reach its destination. 
Using the results presented above, we can optimally assign the link capaci- 
ties (service rates) in order to minimize the expected total delay. We shall 
first explain in detail how the data network can be described by a queueing 
network. 

Suppose that there are N switching nodes, labeled n = 1,2,...,N, and J 
communications links, labeled 7 = 1,2,...,J. We assume that all the links 
are perfectly reliable and not subject to noise, so that transmission times are 
determined by message length. We shall also suppose that the time taken to 
switch, buffer, and (if necessary) re-assemble and acknowledge, is negligible 
compared with the transmission times. Each message is therefore assumed to 
have the same transmission time on all links visited. Transmission times are 
assumed to be mutually independent with a common (arbitrary) distribution 
having mean 1/j (bits, say) and variance o?. Traffic entering the network 
from external sources is assumed to be Poisson and that which originates 
from node m and is destined for node n is offered at rate Vinny; the origin— 
destination pair determines the message type. We shall assume that each link 
operates under a FCFS discipline and that a total capacity of ¢; (bits per 
second) is assigned to link j. 

In order to apply the above results, we shall need to make a further assump- 
tion. It is similar to the celebrated independence assumption of Kleinrock [11]. 
As remarked earlier, each message has the same transmission time on all links 
visited. However, numerous simulation results (see for example [11]) suggest 
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that, even so, the network behaves as if successive transmission times at 
any given link are independent. We shall therefore suppose that transmission 
times at any given link are independent and that transmission times at differ- 
ent links are independent. This phenomenon can be explained by observing 
that the arrival process at a given link is the result of the superposition of a 
generally large number of streams, which are themselves the result of thin- 
ning the output from other links. The approximation can therefore be justified 
on the basis of limit theorems concerning the thinning and superposition of 
marked point processes; see [3, 4, 5], and the references therein. Kleinrock’s 
assumption differs from ours only in that he assumes the transmission-time 
distribution at a given link j is exponential with common mean 1/1, a natural 
consequence of the usual teletraffic modeling assumption that messages ema- 
nating from outside the network are independent and identically distributed 
exponential random variables. However, although the exponential assump- 
tion is usually valid in circuit-switched networks, we should not expect it 
to be appropriate in the current context of message/packet switching, since 
packets are of similar length. Thus it is more realistic to assume, as we do 
here, that message lengths have an arbitrary distribution. 
For each origin-destination (ordered) pair (m,n), let 


R(m,n) = {rmn(1), rmn(2),---5Tmn(Smn)} 


be the ordered sequence of links used by messages on that route; s,,,, is the 
number of links and rm,,(s) is the link used at stage s. Let aj;(m,n, 5) = Vmn 
if Tmn(s) = 7, and 0 otherwise, so that the arrival rate at link j is given by 
a5 = Vm Dendm Dosw Oj(™M,N, 8), and the demand (in bits per second) by 
a; = a,;/p. Assume that the system is stable (a; < yd; for each 7). The 
optimal capacity allocation (¢;,j = 1,2,...,J) can now be obtained using 
the results of Section 14.4. For unit costs, the optimal allocation of capacity 
(constrained by Dy oj; = 1) satisfies wd; = aj + A\/Aj, j = 1,.-., J, where 
N= (“- nee ar) /(Lp 4 Jax), in the case of exponential transmission 
times. More generally, in the case where the transmission times have an ar- 
bitrary distribution with mean 1/y and variance 07, the optimal allocation 
satisfies (to first order in €) 


VG 
pes aie a te ; 14.10 
Ld; 5 Jaz (« or ‘(ai », a) E ( ) 


where cy = 1 a3? (ay, + 2r Jar) /(ak + A/a)? and € = p20? — 1. 

To illustrate this, consider a symmetric star network, in which a collection 
of identical outer nodes communicate via a single central node. Suppose that 
there are J outer nodes and thus J communications links. The corresponding 
queueing network, where the nodes represent the communications links, is a 
fully connected symmetric network. Clearly there are J(.J—1) routes, a typical 
one being R(m,n) = {m,n}, where m #4 n. Suppose that transmission times 
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have a common mean 1/ and variance o? (for simplicity, set = 1), and, 
to begin with, suppose that transmission times are exponentially distributed 
and that all traffic is offered at the same rate v. Clearly the optimal allocation 
will be ¢; = 1/J, owing to the symmetry of the network. What happens to 
the optimal allocation if we alter the traffic offered on one particular route 
by a small quantity? Suppose that we alter vj2 by setting 112 = vy +e. The 
arrival rates at links 1 and 2 will then be altered by the same amount e. Since 
jt = 1 we will have aj = ag =v +e and a; = v for j = 3,..., J. The optimal 
allocation is easy to evaluate. We find that, for 7 = 1,2, 


(l-Jv—2e)/v+e 1 : Ley ge 


(F-D)jptajere J” 2 
and for 7 = 3,...,J, 


oj =vtet 


nap po (bed gee 1 des. f 
SUT Not lyre I Py CONS): 


Thus, to first order in e, there is an O(1/J) decrease in the capacity at all 
links in the network, except at links 1 and 2, where there is an O(1) increase 
in capacity. 

When the transmission times are not exponentially distributed, similar 
results can be obtained. For example, suppose that the transmission times 
have a distribution whose squared coefficient of variation is 2 (such as a 
mixture of exponential distributions). Then it can be shown that the optimal 
allocation is given for 7 = 1,2 by 


1 1(J?v? — Jv + 2)(J?v? — 2Jv —1) 5 
bj = J at 9 J2v et O(e ) 
and for 3< 7 < J by 
1 (J —2)(J*v? — Jv + 2)(J2v? — 2Jv —-1) 
J 4J?v 


oj = e+ O(e’). 

Thus, to first order in e, there is an O(J*) decrease in the capacity at all 
links in the network, except at links 1 and 2, where there is an O(.J*) increase 
in capacity. Indeed, the latter is true whenever the squared coefficient of 
variation c is not equal to 1, for it is easily checked that ¢; = 1/J+g9s(c)e+ 
O(e?), § = 1,2, and ¢; =1/J — (J/2—1)gs(c)e + O(e”), j = 3,..., J, where 


Jv(Jv —1)%e— (J44 — 3.913 +332? + Ju + 2) 
gate) = 2Pv ; 


Clearly g;(c) is O(J7). It is also an increasing function of c, and so this accords 
with our previous general results on varying the coefficient of variation of the 
service-time distribution. 
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14.7 Conclusions 


We have considered the problem of how best to assign service capacity in 
a queueing network so as to minimize the expected number of customers in 
the network subject to a cost constraint. We have allowed for different types 
of customers, general service-time distributions, stochastic or deterministic 
routing, and a variety of service regimes. Using an accurate approximation 
for the distribution of queueing times, we derived an explicit expression for 
the optimal allocation to first order in the squared coefficient of variation 
of the service-time distribution. This can easily be extended to arbitrary 
order in a straightforward way using a standard perturbation expansion. We 
have illustrated our results with reference to data networks, giving particular 
attention to the symmetric star network. In this context we considered how 
best to assign the link capacities in order to minimize the expected total delay 
of messages in the system. We studied the effect on the optimal allocation 
of varying the offered traffic and the distribution of transmission times. We 
showed that for the symmetric star network, the effect of varying the offered 
traffic is far greater in cases where the distribution of transmission times 
deviates from exponential, and that more allocation is needed at nodes where 
the variation in the transmission times is greatest. 
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Chapter 15 


Analysis of a simple control policy 
for stormwater management in two 
connected dams 


Julia Piantadosi and Phil Howlett 


Abstract We will consider the management of stormwater storage in a 
system of two connected dams. It is assumed that we have stochastic in- 
put of stormwater to the first dam and that there is regular demand from the 
second dam. We wish to choose a control policy from a simple class of con- 
trol policies that releases an optimal flow of water from the first dam to the 
second dam. The cost of each policy is determined by the expected volume 
of water lost through overflow. 


Key words: Stormwater management, storage dams, eigenvalues, steady- 
state probabilities 


15.1 Introduction 


We will analyze the management of stormwater storage in interconnected 
dams. Classic works by Moran [4, 5, 6] and Yeo [7, 8] have considered a 
single storage system with independent and identically distributed inputs, 
occurring as a Poisson process. Simple rules were used to determine the in- 
stantaneous release rates and the expected average behavior. These models 
provide a useful background for our analysis of more complicated systems 
with a sequence of interdependent storage systems. 

In this chapter we have developed a discrete-time, discrete-state Markov 
chain model that consists of two connected dams. It is assumed that the 
input of stormwater into the first dam is stochastic and that there is 
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regular release from the second dam that reflects the known demand for 
water. We wish to find a control policy that releases an optimal flow of water 
from the first dam to the second dam. In the first instance we have restricted 
our attention to a very simple class of control policies. To calculate the cost of 
a particular policy it is necessary to find an invariant measure. This measure 
is found as the eigenvector of a large transposed transition matrix. A key 
finding is that for our simple class of control policies the eigenvector of the 
large matrix can be found from the corresponding eigenvector of a small block 
matrix. The cost of a particular policy will depend on the expected volume of 
water that is wasted and on the pumping costs. An appropriate cost function 
will assist in determining an optimal pumping policy for our system. 

This work will be used to analyze water cycle management in a suburban 
housing development at Mawson Lakes in South Australia. The intention is to 
capture and treat all stormwater entering the estate. The reclaimed water will 
be supplied to all residential and commercial sites for watering of parks and 
gardens and other non-potable usage. Since this is a preliminary investigation 
we have been mainly concerned with the calculation of steady-state solutions 
for different levels of control in a class of practical management policies. The 
cost of each policy is determined by the expected volume of water lost through 
overflow. We have ignored pumping costs. A numerical example is used to 
illustrate the theoretical solution presented in the chapter. For underlying 
methodology see [1, 2]. 


15.2 A discrete-state model 


15.2.1 Problem description 


Consider a system with two connected dams, D,; and Ds, each of finite ca- 
pacity. The content of the first dam is denoted by Z, € {0,1,...,h} and 
the content of the second dam by Z2 € {0,1,...,k}. We assume a stochastic 
supply of untreated stormwater to the first dam and a regular demand for 
treated stormwater from the second dam. The system is controlled by pump- 
ing water from the first dam into the second dam. The input to the first 
dam is denoted by X, and the input to the second dam by X 2. We have 
formulated a discrete-state model in which the state of the system, at time t, 
is an ordered pair (21,4, 22,4) specifying the content of the two dams before 
pumping. We will consider a class of simple control policies. If the content of 
the first dam is greater than or equal to a specified level U; = m, then we 
will pump precisely m units of water from the first dam to the second dam. 
If the content of the first dam is below this level we do not pump any water 
into the second dam. The parameter m is the control parameter for the class 
of policies we wish to study. We assume a constant demand for treated water 
from the second dam and pump a constant volume U2 = 1 unit from the 
second dam provided the dam is not empty. The units of measurement are 
chosen to be the daily level of demand. 
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15.2.2 The transition matrix for a specific control 


policy 


We need to begin by describing the transitions. We consider the following 
cases: 


for the state (21,0) where z1 < m we do not pump from either dam. If n 
units of stormwater enter the first dam then 


(21,0) — (min([{z1 + nJ, 2), 0); 


for the state (z1, 22) where z1 < m and 0 < zg we do not pump water 
from the first dam but we do pump from the second dam. If n units of 
stormwater enter the first dam then 


(21, Z2) > (min([z, + n], h), 22 — 1); 


for the state (z1,0) where z1 > m we pump m units from the first dam 
into the second dam. If n units of stormwater enter the system then 


(21,0) > (min([zi —m + n],h),min(m,k)); and 


for the state (21, 22) where z; > m and 0 < z2 we pump m units from the 
first dam into the second dam and pump one unit from the second dam to 
meet the regular demand. If n units of stormwater enter the system then 


(21, 22) > (min([z1 — m+ n],h), min(z2 + m—1,k)). 


If we order the states (21, 22) by the rules that (21, z2) ~ (¢1, G2) if z2 < 


and (21, 22) ~ (¢1, 22) if 2, < ¢, then the transition matrix can be written in 


the form 
AO0Q::-0B0-::-000 
AO0Q::-0B0::--000 
O0OA---008B---00 0 
00:---A00-:-0B0 
H(A, B) = 


00---0A0---008B 
00---00A---008B 
00---000-:--ADB 
00---000-:-0AB 
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The block matrices A = [a;,;] and B = [b,,| for 7,7 € {0,1,...,h} are defined 
by 


0 for l<i<m-l1,j<i 
Dj for i=0,0<j7<h-1 
aj =< pi for 1<i<m—1,1<j<h-landi<j 
pri? for j=h,0<i<m-1 
0 for m<i<h,0<j<h 
and 
0 for 0<i<m-—1,0<j<h 
Pj for i=m,0<j<h-1 
big =< Pj-itm for m+1<ish, 0O<j<h-landi-m<j 
Pa—itm’ for j=h, m<i<h 
0 for m+1<i<h,j<i-™m, 


where p,. is the probability that r units of stormwater will flow into the first 
dam, and p,t = 7, p,. Note that A,B € R@+)*+)) and H eR", 
where n = (h+1)(K+ 1). 


15.2.3 Calculating the steady state when 1<m<k 


We suppose that the level of control m is fixed and write H,, = H € IR”*”. 
The steady state z[m] = x € RR” is the vector of state probabilities deter- 
mined by the non-negative eigenvector of the transposed transition matrix 
K = H’ corresponding to the unit eigenvalue. Thus we find x by solving the 
equation 


Kx=2 subject to the conditions «>0 and 172=1. (15.1) 


If we define C = AT and D = B® then the matrix K can be written in block 
form as K = F(C) + G,,(D) = F + Gm, where 


0 CCO.:-:: 
1 00C---0 


oOo 


co 
| 

= 

Q 


15 Control policy for stormwater management in two connected dams 277 


and 


i) 

3 
II 
3 
S 
v 
°o 
°o 
o 


Therefore Equation (15.1) can be rewritten as 


[F+G,|c = 2 


and by substituting y = [J — Fa and rearranging, this becomes 


GmlI — Fly = y. (15.2) 


To solve this equation we make some preliminary calculations. We will show 


later that the inverse matrices used in the sequel are well defined. From the 
Neumann expansion 


I=—F)1s14+ P+ FP +--%, 


we deduce that 


P PC PC?... PC*-! PC* 
OF 1G) ws Gk-* Cha 


O00 Peas eee 

(I-F)"* = é ’ 
0 O Cc 
Q:-the. > 


where we have written P = (I — C)~?. It follows that 


= 0 0 


where S = [So,S1,..-,S—-m] is a block matrix with columns consisting 
of k —m-+ 1 blocks given by 
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ROMs! DPC™ 
Dcm-2 Docm-t 
D DC 
So => 0 > Sy = D ’ 
0 0 
0 0 
DPCrk-™ DPCK-™t1 
DCK-m-1 DCk-™ 
DCk-2m+1 DCK-2m+2 
eas] ae a pok-am |» Sk-2m42 = pok-2m+1 | > 
DC DC? 
D D(I+C) 
and finally 
DPC? 
DCK-?2 
DCk-™ 
Sk—m = DCk-m-1 
DC™ 


DI+C+-+--+0"") 


By writing the matrix equation (15.2) in partitioned form 


Peele 


it can be seen that u = 0 and that our original problem has reduced to solving 


the matrix equation 


(I—S)v=0. (15.3) 
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Thus we must find the eigenvector for S corresponding to the unit eigenvalue. 
To properly describe the elimination process we need to establish some suit- 
able notation. We will write S = [S;,;] where i,j € {0,1,...,(&—m)} and 


DPCT™—3 for 7=0 
DO™-1-t5 for 1<i<k-—m-1 
and i—m+1<j 
D(T+C+++-+C3-*+2m-1) for i=k—m 

and k-2m+1< Jj 
0 for m+ j <i. 


Sig = 


We note that 


So,j 


S15 
S$; = 


Shes 


and that 17S; = 1" for each j = 0,1,...,k — m. Hence S$ is a stochastic 
matrix. One of our key findings is that we can use Gaussian elimination to 
further reduce the problem from one of finding an eigenvector for the large 
matrix .S € IRV mt x01) 46 oneof finding the corresponding 
eigenvector for a small block matrix in R{+D* (+4), 


15.2.4 Calculating the steady state form =1 


For the special case when m = 1 we have the block matrix G, with the 


following structure: 
0Of7000-:- 0 


1;DD0-::-0 
2/0 0D---0 


Gi = 
k|00 0 D 
In this case we have 
= 0 0 
Gill — F) — RS|\° 
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where S$ € IR(PHDEX(h+DE ig given by 


DP DPC DPC? .-. DPC*-? DPC*-} 


0 D DC ..-. DC’ 3 DC 
0 0 DD... DC®4 DCk-3 
S= 
0 D DC 
0 0 D 


We now wish to solve 
(I-—S)v=0. 


15.2.5 Calculating the steady state for m=k 


For the case when m = k we have 


000---0--- 0 


DDD.:--D-::--D 
Therefore 


00 


G,(I-F)7= RS 


’ 


where S = DP € R'"'+)*+) and so we wish to solve 


(I — DP)v =0. 


15.3 Solution of the matrix eigenvalue problem using 


Gaussian elimination for 1< m<k 


We wish to find the eigenvector corresponding to the unit eigenvalue for the 
matrix S. We use Gaussian elimination in a block matrix format. During the 
elimination we will make repeated use of the following elementary formulae. 


Lemma 1. If W = (I— V)7! then W = (I+ WV) and WV" =V'W for 


all non-negative integers r. 
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15.3.1 Stage O 


Before beginning the elimination we write T = [T—$]© = 7—S© =I -S. 
We consider a sub-matrix M from the matrix T() consisting of the (0,0), 
(q,0), (0,8) and (q,s)'” elements where 1 < q¢ < m-—1 and 1 < s. We have 


M= 


I-DpPc™! —-DPC™ Hs 
—DC™-1-4 I§,,— DC™-1-49 
If we write Wo = [I - DPC™~']~! then the standard elimination gives 


I —WpDPC™-1+8 
M- 
0 I6q,s — DO™--4ts — DC™-!-IWyDPO™ "+s 


I —~DPC™-!WoC® 
= 
0:16;,— DC" + Wo DPC™—|C 


I —DPC™1(WoC)Cs-} 
a 
0 164, - DO™—}-4(WoC) Cs} 


After stage 0 of the elimination we have a new matrix T) = J — § where 


0 for j=0 

DPC™1(WoC)C3-! for = 0; 19 

DCO™-1-*(WoC)C7-} for 1<i<m-1,1<j 
9) _ DO™-1-t5 for m<i<k—-m-l 


and i-m+1<j 
D(I+C+---+C9-82m-1) for i=k—m 
and k-—2m+1<9 
0 for m+j <2. 


Note that column 0 is reduced to a zero column and that row 0 is fixed for all 
subsequent stages. We therefore modify T“) by dropping both column and 
row 0. 


15.3.2 The general rules for stages 2 to m — 2 


After stage p—1 of the elimination, for 1 < p < m—2, we have T) = J—S) 
where 
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0 for j=p-1 
DC” TP3(WiC)C7-P for i=p—1, p<j 
DC™-1-*TP=)(WiC)CI-P for p<Si<m-1, p<j 


g(P) — D[aj-m4i(WiC)Co-? for m<i<m+p-—2, p<j 
Re Dom for m+p—-1<i<k-m-1l 
and i-—m+1< 9 
D Mico aia ared Si for i=k—m, k-2m4+1<j 
for m+ j <i. 


Column p — 1 is reduced to a zero column and row p — 1 is fixed for all 
subsequent stages. We modify T®) = (I — S)) by dropping both column 
and row p— 1 and consider a sub-matrix M consisting of the (p,p), (q,p), 
(r,p), (m+p—1,p), (p,8), (q,8), (r,s) and (m+ p—1,s) elements, where 
ptli<q<m-1,m<r<m+p-—2and p+1<s. The sub-matrix M is 
given by 


T= DeOn*- 2 TPO GC) “DOR FT (Ce * 
Der l2 (WiC) Tee DO™ ay ec? 2 
“Fs DIP, r— m+i(WiC) I5p,8 We DAG rare r— wciat W.C)C*e? 
ah Find Ge DCI? 


and if W, = [I — DC™~!~?(WoC) ---(W,-1C)|~! elimination gives 


—pom-1-» T]?_.(W.c)os-P- 
15,5 —DC™-!-4]]?_,(W,C)Cs-P- 
I6r,8 — DY r— m-+i6 WiC)Os-2 
I5m+p-1,2 — D(WpC)C*-P-1 


ooo 


After stage p of the elimination we have T?+) = J — S@+)) where 


0 for j=p 
DO"? T]P_o(WiC)C7 P+ for i=p, pt1<j 
DO™!~*TTP_o(WiC)C2-?-! for p+1l<i<m-1 
and p+1<j 
DIG Wie? for m<i<m+p-1 
oe = and p+1<j 
DOME ats for m+p<i<k—-m-1 
and i-—m+1<j 
Dy. apmes OL for t=k—m 
and k-—2m+1<j 
0 for m+ j <i. 


Since column p is reduced to a zero column and row p is fixed for all subse- 
quent stages we modify T@*+)) by dropping both column and row p. 
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15.3.3 Stage m—1 


After stage m — 2 we have T("™-) = J — §(™—) where 


0 for j=m—2 

POEs W.C)cs-™+1 for i=m—2,m—-1<j 

DYTo *(W, C)\CI mrt for t=m-—1, m-1<j 

DTT 2 WwOjoHmn for m<i<22m-—3 

gia) — and m—1<j 
a DOs for 2m—-—2<i<k—-—m-1 

and i-m+1<j 

Dy ee for i=k—m 
and k-—2m+1<j 

0 for m+ j <1. 


We modify T’"—)) by dropping both column and row m — 2 and consider a 
sub-matrix M consisting of the (m — 1,m— 1), (r,m— 1), (2m — 2,m—1), 
(m — 1,8), (r,s) and (2m — 2,s)" elements, where m < r < 2m —3 and 
m <s. We have 


I~ DITiX9 (WiC) —DITizo (Wec)or™ 
M= = Die r— * (Wi) Die. r— Pee WAC) Oot 
—D Ib2m—2,s ~— DCS es 


If we write Wi,-1 = [I — D(WoC)---(Wm-—2C)|~' then the standard elimi- 
nation gives 


I —D(WoC) ++ (Wm aces” 
M—|0  16p5—D(W>-ma4iC)--*(WmaC)C-™ 
0 162m—2,5 _ D(Wy—-1C)C#—™ 


After stage m — 1 of the elimination we have T(™ = J — $(™ where 


0 for j=m-1 

DITimo (Weo)yor-™ for i=m—1, m<j 

Dp a Td WiCjOr for m<i<2m—-2, m<j 
SS Dom ha for 2m—-1<i<k—-m-1 
and i—m+1<j 
Dee for i=k—m, k-2m4+1<j 
0 for m+ j <i. 
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Column m — 1 is reduced to a zero column and row m — 1 is fixed for all 
subsequent stages. We modify T°” by dropping both column and row m—1. 


15.3.4 The general rules for stages m to k — 2m 


After stage p—1 form < p< k—2m we have T®) = I — §) where 
0 for j=p-1 
PIN, m(WiC)Ci-P for i=p—1, p<j 
DIPE mai(WiC)CI-P for p<i<m+p-2, p<j 


Ses DO™- 1-43 for m+p—-1<i<k-m-1l 
and i-—m+1<j 
j-k+2m—-1 pt e mh _ _ é 
Dy C for i=k—m, k-2m+1<j 
0 for m+j <1. 


We modify T) by dropping both column and row p— 1 and consider a sub- 
matrix M using the (p,p), (r,p), (m+p—1,p), (p, 8); (7,8) and (m+p—l, 8)!" 
elements, where p+ 1<r<m+p-—2and p+1< 5s. We have 


(pie aS Ge)  -Die mal w.cycs-P 
MS —D][R-, re m+i (WiC) Toy, :™ DIR, r— Perey ( W.C)C*? 
_D Pies ae DOT? 


and if we write W, = [I — D(W,-m+1C)--:(W y-1C)]~* then the standard 
elimination gives 


I —D(Wy-m41C)---(W,C)C%-?-} 
M—|0 I6,,,— D(W,—m4iC) ++: (W,C)Cs-P-1 
0 I5m-p-1,s — D(WpC)C*-?-} 
After stage p of the elimination we have T+) = J — S@+) where 
0 for j=p 
DT een MaQQe  ) for plac p 1 
and p+1<j 
gt) _ DOM ry for m+p<i<k-—m-1 
and i—m+l1<j 
Dyes alae for i=k—m 
and k-2m+1< 9 
0 for m+j <1. 


We now modify T+) by dropping both column and row p. 
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15.3.5 Stage k —2m+1 


To reduce the cumbersome notation it is convenient to write p = k-—2m+1. 
After stage k — 2m we have T) = I — §) where 


0 for j=k—-—2m 
DY ae MOC tn for c= k= 2m 
and pj 
SP =< D2". (WiC\Co* 1 for p<i<k—-m-1 
and pj 
Dye for i=k—m, p<j 
0 for m+j <i. 


We modify T®) by dropping both column and row k — 2m. We consider 
a sub-matrix M consisting of the (p,p), (r,p), (k — m,p), (p,s), (r,s) and 
(k —m, s)'” elements, where p+1<r<k—m-—1and p+1<-s. We have 


T-D]Po-ma(MC) = —DIP I -m4i(WiC)Ce-? 
M=] -DI[Pr mii(WiC) Tors — DITPE pm (WiC) C*-? 
—D Dope DSC 


We write W, = [I — D(Wy-m41C)---(Wy-1C)]~'. In order to describe the 
final stages of the elimination more easily we define 


Xk_2m =a) and set Xp_-2m+r41 a + Xp_omtr(We_-amtrC) (15.4) 


for each r= 0,...,m—1. With this notation the standard elimination gives 
I —D I barren CS) (Osces a, 
M— |9 6;.5 — DV we Ce 


0 15pm — D{YyB* C+ (Xp cer Pt} 


After stage p = k — 2m +1 of the elimination we have T@+) = J — S(@+) 
where 
0 for j=p 
DTT netMie e for ptl<4<k-m=1 
ger) | ame Aen 

eee (2), mae 

+(Xp4i1C)CF-P1h for t=k—m, p+1<j 
0 for m+j <1. 


We modify T‘*-?2™+2) by dropping both column and row k — 2m +1. 
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15.3.6 The general rule for stages k —2m+2 
tok—m-—2 


After stage p— 1 for k —-2m+2< p< k—m-—2 we have T®) = J — §©) 
where 


0 for j=p-1 
D[=p—m(WiC) CIP for t=p—1, p<j 

S®) =) DIT mai(WiC)CF-? for pSi<k—m-1,psj 
D {rich ct + xpos} for i=k—m, p<j 
0 for m+j <i. 


We modify T) by dropping both column and row p— 1 and consider a sub- 
matrix M using the (p, P), (r,P), (k _ mM, Pp), (p, 8); (r, s) and (k —™, 3)" 
elements, where p+1<r<k—mandp+1< _s. We have M given by 


I~ Dp mai(WeC) — — DIT p mys WCC? 
=D ilo am Wie)” Pea Dal WEO)O* 
em T5x-m,s — D{ ry2b Ct + X~C°-?} 


and if we write W, = [I — D(W,—m+4iC)---(W,-1C)]~! then the standard 


elimination gives 


I DR WO 
M—- 0 I5p,3 = DYE eg ere * 
dia. Wpcteces {D Nae hohe yO Ca | 


After stage p of the elimination we have T+) = J — S@+)) where 


0 for j=p 
Dente oO ters price kamal 
(p+1) and p+1<j 
85 =) p (ner tet 
+Xp4i07-P-1} for t=k—m, p+1<j 
for m+ j <i. 


We again modify T+) by dropping both column and row p. 
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15.3.7 The final stage k —m-—1 


The matrix S@-™—) is given by 


g(b-m-1) — [PMesg an) eee aur . 


DXk-m-1 DI + (Xk—-m-1C)| 
Hence 
po-m1 _ [I-DITe ome) -DILTan(MC)C | 
—DXy_-m-1 l= DI + (Xp 1) 


If we write Wy-m—1 = [IL — D(Wp_2mC) +++ (Wk—m—2C)]|~*+ then elimination 
gives 

I —D(Wr-2mC) aie (Wi-m-1C) 

aa F PDX 

Since the original system is singular and since we show later in this chapter 
that the matrices Wo,...,Wz—m-_1 are well defined, we know that the final 
pivot element J—DX;,—m must also be singular. Therefore the original system 
can be solved by finding the eigenvector corresponding to the unit eigenvalue 
for the matrix DX,» and then using back substitution. 


15.4 The solution process using back substitution 
forl<m<k 


After the Gaussian elimination has been completed the system reduces to an 
equation for vo, 


k-—m 
vp = DPC™ (WoC) S- C¥1n;, 
j=l 
a set of equations for v, when p = 1,2,...,m— 2, 
p k-—m 
Up = DCO™1-P Toner ro ema 
t=0 j=p+1 
a set of equations for v, when gq = m-—1,m,...,k -m—1, 
q k-—m 
Ug = D Th (W,C) S- Cit 1y,, 
t=q-—m+1 j=at1 


and finally an equation for vg_m 


(I = DXx—m)Uk—m =0. 
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We begin by solving the final equation to find vy_m. The penultimate equa- 
tion now shows us that 


k—m-1 


Ue_m-1 = D II (W,C) 


t=k-2m 


Uk—m- 


We proceed by induction. We suppose that m < q and that for all s with 
q<s<k—mwe have 


and 


Uk—m: 


k-m k—m—-1 
x cota =| I] mo) 


j=st1 t=s+1 


The hypothesis is clearly true for g = k — m— 1. Now we have 


k-—m 
) CII hy, 
J=4 


k-—m 
=Ugt+C .» City, 
J=ar1 
k—m-1 k—m—-1 
-{ II (W:C) +C II (W,C) basen 
t=q-—m+1 t=qt+l 
q-l k—m-1 
{| I] mo) wrth xc [] (0©)| vm 
t=q-—m+1 t=q+1 
k—m-1 
= (W:C) Uk—m 
t=q 
and hence 
q-1l k—-m ; 
Ug-1 = D II (W,C) ye CI 4y, 
t=q-—m I= 
k—-m-1 
-o| IT] (W0©)] vm. 
t=q-—m 


Thus the hypothesis is also true for m—1<q-—1<s<k-—m. To complete 
the solution we note that the pattern changes at this point. We still have 
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k-—m 
» (Ore ama 
j=m-1 
k—m 
=Um1+C $7 Ch ™0; 
j=m 


k 1 


ce} TT ow 


t=m 


=P Lie 


m—2 k—m-1 
= 4D IL™ Wmitl Uk—m 
t=0 t=m 
k—m—-1 
=| TT oc] mn 
t=m-—1 
but now we have 
m—2 k—-m 
Um—2 = DC (W, pe ie 
t=0 j=m-1 
k—m-1 
IT 0.01] nm 


We use induction once more. Let 1 < p < m-— 2 and for p< s < m— 2 we 


suppose that 
-1 


k—m 
Us = pose ia Uk—m 
t=0 
and 
k-m : k—m-1 
> CF yp II (W:C)| Up—m- 
j=stl t=s+1 
The hypothesis is true for p = m — 2. Now we have 
k-—m 
Ory; 
J=P 
k-—m 
SiO OTP ey 
J=pt+1 
p-1 k—-m-1 
{pe" P11 T[(mc)| W, + i} xC] [[ (W.0)| xm 
t=0 t=pt+1 
k—-m-1 
II (Wi; Uk—m 
t=p 
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and hence 


k— 


3 Ci~4y, 


J=p 


p—1 


Up-1 = DC™?P [mo 


i 


t=0 
k—-m— 
=pe™| |[T (mc) 
t=0 


Uk—m- 


Thus the hypothesis is also true for 0 < p—1<s<k—m. In summary we 
have the solution to Equation (15.3) given by 


k—m-1 
ty= DEP (WiC) | vn—m (15.5) 
t=0 
for p=0,1,...,m—2 and 
k-—m-1 
vq=D I] W0©)| vm (15.6) 
t=q—m+1 

for g=m-—1,m,...,k —m—1. The original solution can now be recovered 


through 


U 0 1 
y= H = Hl and «=(I-F)‘y, 
where x = a[m] is the steady-state vector for the original system using the 
control policy with level m. The steady-state vector can be used to calculate 
the expected amount of water lost from the system when this policy is imple- 
mented. The cost of a particular policy will depend on the expected volume 
of water that is wasted and on the pumping costs. This cost will assist in 
determining an optimal pumping policy for the system. 


15.5 The solution process for m = 1 


For the case when m = 1 the final equation is given by 


(I = D)vr = 0. 
We will show that (I — D)~" is well defined and hence deduce that vz, = 0. 
Since 
_ {0 Le 
>= [oxi] 


it follows that (I — D)~+ is well defined if and only if (I — M2)~! is well 
defined. We have the following result. 
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Lemma 2. The matrix Mz"! is strictly sub-stochastic and (I — M2)~! is 
well defined by the formula 


(I — Mz)~! = (I+ Ma + My? +-+- + Mg*)(I — My***)7?. 
Proof. We observe that 
17M, = [pi*,1,1,...,1] 
and suppose that 
17M." = [a0,a1,..-,@p—1,1,1,...,1] 
for each r = 1,2,...,q, where aj = a,;(r) € (0,1). Now it follows that 


17 Mo**! = [ag,a1,..-,@q-1,1,1,...,1]Me 
= Qo|p1, po, 0,---,0,0,0,...,0] + 
a1[p2,P1,Po,---,0,0,0,...,0]) +--- 
basalt 
Cig 9 |Dga1s PG. Po—39.8s= 5 P05 0,05.0250) + 


Qq—1[Pq Pq—15 Pq—2: tee »P1; Po, 9, oe ., 0] aE 
[Pq-+1;Paq> Pq; +++,P2,P1,P0;-- .,0] ele ais 
oe Hf 


[Pk, Pk—1; Pk—2)-- + }Pk—q4+1;Pk—q) Pk—q—1)-- +» Po] + 
lata Oe? Dea" oo Die SDE gai pie tee pro]. 
The first element in the resultant row matrix is 
Qopi + Qype +++++Qg-1Pq¢ + Pati? = Bo <1, 
and the second element is 


Qopo + A1p1 + +++ + Ag-1Pq-1 + pq? =, <1. 


A similar argument shows that the j*” element is less than 1 for all 7 < q 
and indeed, for the critical case 7 = q, the j*” element is given by 


Og—-1po + pit = By <1. 


The remaining elements for g < j < k are easily seen to be equal to 1. Hence 
the hypothesis is also true for r = q+ 1. By induction it follows that 


1 MoE. 
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Hence (I — Mz**")~! is well defined. The matrix (J — M2)~! is now defined 
by the identity above. This completes the proof. 


By back substitution into the equation (J—S')v = 0 we can see that vp = 0 
for p=1,...,k —1 and finally that 


(I — DP)up = 0. 


Since DP is a stochastic matrix, the eigenvector vg corresponding to the unit 
eigenvalue can be found. We know that 


and hence we can calculate the steady-state vector « = (I — F)~+y. 


15.6 The solution process for m = k 


In this particular case we need to solve the equation 
(I — DP)vp = 0. 


Hence we can find the eigenvector vo corresponding to the unit eigenvalue 
of the matrix DP and the original steady-state solution « can be recovered 
through 


and x = (I — F)~1y. 


15.7 A numerical example 


We will consider a system of two connected dams with discrete states 

zy € {0,1,2,3,4,5,6} and z2 € {0,1,2,3,4}. Assume that the inflow to the 
first dam is defined by p, = (0.5)"*+ for r = 0,1,... and consider the control 
policy with m = 2. The transition probability matrix has the block matrix 
form 


A 0 BO 0 
A 0 B00 
H=|0 A 0 B Ol, 
0% 6: Ae ee UB 
O°. 0. -G0t. aa 
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where 
1. Sh Gh 1 1 1 1 
2 4 8 16 32 64 «64 
ga Sh 2 1 1 1 1 
2 4 8 16 32 32 
0 0 O 0 0 0 0 
A=!]0 0 0O 0 0 0 0 
0 0 O 0 0 0 0 
0 0 O 0 0 0 0 
0 0 O 0 0 0 0 
and 
0 OO O 0 0 0 0 
0 O O 0 0 0 0 
a a | 
2 4 8 16 32 64 «64 
B=1|0 2 7 : 16 32 32 
0 0 : i 8 16 16 
0 0 0 a 4 8 8 
0 OO O 0 a = = 


As explained in Subsection 15.2.3 we solve the equation 
(I-S)v=0, 


where S = [S;;] for 7,7 = {0,1,2}. Using the elimination process described 
in Section 15.3 we find the reduced coefficient matrix 


I -W)D(I—C)“10?, —WoD(I — C)-103 


(Tos) (0 I —W, DWoC? 
0 0 I-D(I+W,C) 
where 

447 491 1 
2296 2296 2 0 0 0 0 
1021 1065 1 1 
4592 4592 4 2 0 0 0 
1849 1805 t. ob: ie ae 
9184 9184 8 4 2 
2677 2545 1 1 i. vd 

D(I+W,C) = | is3xes e368 SG z7 3 O 
619 575 1 1 er 
5248 5248 32 16 8 4 2B 
619 575 1 1 i a 
10496 10496 64 32 16 8 4 
619 575 1 1 ee ee 
T0496 6s T0406-—s—«sCisi‘(iaz?]tCiSC<“‘R:SC 
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We solve the matrix equation 

(I — D(I+W,C))v2 = 0 
to find the vector 


_ [—— 821 12337 9053 = 7411 7411 7411 i 
"2 |15124’ 3781’ 60496’ 60496’ 60496’ 120992’ 120992 


and, by the back substitution process of Section 15.4, the vectors 


_ = 6197 10563 14929 23661 23661 23661 ig 
113408’ 26816’ 53632’ 107264’ 214528’ 429056’ 429056 


and 


=f i 2S. ee Se, ae 
vO lA” A? 46? 8 39-64? 6d 


The probability measure is the steady-state vector x given by 


1718 983 983 983 983 983 983 
62375’ 12475’ 24950’ 49900’ 99800’ 199600” 199600? 


1718 3197 3197 3197 3197 3197 3197 
62375? 62375’ 124750’ 249500’ 499000’ 998000’ 998000” 


3436 4676 569 1676 2183 2183 2183 
62375? 62375’ 12475’ 62375’ 124750’ 249500’ 249500’ 


2816 3888 9959 6071 4127 4127 4127 
62375? 62375’ 249500’ 249500’ 249500’ 499000” 499000’ 


2787 3284 12337 9053 7411 7411 7411 
62375? 62375’ 249500’ 249500’ 249500’ 499000” 499000 


Using the steady-state vector x we can calculate the expected overflow of 
water from the system. Let z = z(s) = (21(s), 22(s)) for s = 1,2,...,n denote 
the collection of all possible states. The expected overflow is calculated by 


r= bs secieo Is, 


s=1 Lr=0 


where f[z(s)|r] is the overflow from state z(s) when r units of stormwater 
enter the first dam. We will consider the same pumping policy for four differ- 
ent values m = 1,2,3,4 of the control parameter. We obtain the steady-state 
vector x = 2[m] for each particular value of the control parameter and deter- 
mine the expected total overflow in each case. Table 15.1 compares the four 
parameter values by considering the overflow J; = J;[m] from the first and 
second dams. 

From the table it is clear that the first pumping policy results in less 
overflow from the system. If pumping costs are ignored then it is clear that 
the policy m = 1 is the best. Of course, in a real system there are likely to 
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Table 15.1 Overflow lost from the system for m = 1, 2,3,4 


m=1 m=2 m=3 m=4 
1 di 1 1 
J 7 25 35 17 
9053 263 5T 
J2 0 62375 1350 272 
1 11548 317 73 
Total 7 62375 1350 272 


be other cost factors to consider. It is possible that less frequent pumping of 
larger volumes may be more economical. 


15.8 Justification of inverses 


To justify the solution procedure described earlier we will show that W,. is 
well defined for r= 0,1,...,4 —m-—1. From the definition of the transition 
matrix H = H(A, B) in Subsection 15.2.2 and the subsequent definition of 
C = A’ and D = B" we can see that C and D can be written in block form 


as 


LI, 0 


oe Ee 


where L; € IR™*™ and M; € R\-™t)*™ are given by 

Pm Pm-1°*° P1 
ae Pm4+1 Pm °<*° P2 
Ly = és 7 . ’ M, = : : 
Dia Depo ee: Dee oe, 


Pm-1 Pm—2 °** Po 4 as 4 
Pr Pr-1*** Phem+i 


and where Lz € IR™**-™+)) and My € RO&AW™tD*(h-™+1) are given by 


and 
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Pm Pi 0 0 
Dmtitt* pa Poot (OO 
Mz2=| : 
ae, as sno Sri a “+ Dm-1 
+ 
Pho 77° Phamti Ph-m °° Pm 


15.8.1 Existence of the matrix Wo 
Provided p; > 0 for all j < h the matrix Ly is strictly sub-stochastic [3] with 
17L, < 17. It follows that (I — L,)~! is well defined and hence 


(oni 


id ane) Mien A ee 


is also well defined. Note also that 17C = [17,07] and17D = [07,17]. We 
begin with an elementary but important result. 
Lemma 3. With the above definitions, 
DPS and DPC" <a er (15.1) 
and the matriz Wo = [I — DPC™~1]~} is well defined. 
Proof. 
Vor CGat => PHS C—o). = Poeso*=1'. 
Hence 


m—-1 
17? pPc™-! = Form-l — Reo 17) | Ly i 


ML"? 0 

= [17L,""'+17M, LT’, 07] 
= [(17L, + 17M,)LT 7,07] 
SD 0. jal. 


for m > 3. Hence Wo = [I — DPC™~1]~ is well defined. 


15.8.2 Existence of the matrix W, for1<p<m—1 


We will consider the matrix 


W, = [I- DC™ ?(Woc)|". 
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We have 


Cpe se EpPe™ Sr Dil Pece 
=1?pPc™? 
Sore, 


Since 17C'™-? < 17 we deduce that 
DOR F ts CPA) DROP Te 1 Oee, 
from which it follows that 
pe? <1 4 DPC | 


and hence that 
PDO" (Wa) SV Or ra: 


Since the column sums of the matrix DC™~?(WoC) are all non-negative and 
less than one, it follows that the inverse matrix 


W, =|[f- Do" (Woc)|"! 
is well defined. 


Lemma 4. With the above definitions the matriz W, is well defined for each 


p=0,1,...,m—1 and for each such p we have 
m—1—p p-l1 
17D) S.-C?) [][(me) <r". 
j=0 t=0 


In the special case when p= m-— 1 the inequality becomes 


m—2 
1p |e) <1. 
t=0 


Proof. The proof is by induction. We note that 


m—2 
LED) SS OE POe tl DP” 
j=0 
and hence 
m—2 
12D CF S17 f= DPe™ “|, 
j=0 


from which it follows that 
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m—2 
1D) S.C?) (WoC) = 1" C. 
j=0 


Thus the result is true for p = 1. Suppose the result is true for 1 < s < p—1. 


Then 
p—2 


De"? [Wee ror Zi? 
t=0 
and hence 
Wy-1 = [I — DO™-?(WoC) --- (Wp-2€)]* 


is well defined. Now we have 


m—1—p p—2 p-2 
17D) S° Cc) [[ (mc) + a7cr"")pe"? |] (wc) 
j=0 t=0 t=0 
m—p p—2 
<17D|S°> c*) [[ mc) 
j=0 t=0 
i a Osa 
and hence 
m—1—p p—-2 p—2 
I7D| S> Cc) [[ (mc) < ater) - pe? [[mc}], 
j=0 t=0 t=0 


from which it follows that 


m—1—p p—2 
IAD VS GP [Tone Was Or. 
j=0 t=0 


If we multiply on the right by C' we obtain the desired result 


m—1—p p—l1 
1D) SoC") |] We) <1". 
j=0 t=0 


Thus the result is also true for s = p. This completes the proof. 


15.8.3 Existence of the matrix W, 
form<q<k—m-I1 


We need to establish some important identities. 
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Lemma 5. The JP identities of the first kind 


p-1 p-1 m—p-1 p-l 
I7D4I+S >] TT mo)| +} So ct} [[ome)e =17 (15.2) 
J=1 [t=p-5 j=0 t=0 
are valid for each p=1,2,...,m—1. 
Proof. From the identity 
m—2 ; 
SOO" P= P 
j=0 
and Lemma 3 we deduce that 
m—2 ; 
1D |S Cr ert Pai. 
j=0 
By rearranging this identity we have 
m—2 
12D SOC? is EDOM P| 
j=0 
and hence 
m—2 
12D C?| (WoC) = 17°C, 
j=0 


from which it follows that 
m—2 
ITDV I+] 5) C1] (WoC) p =17C4+17D= 1". 
j=0 


Therefore the JP identity of the first kind is valid for p = 1. We will use 
induction to establish the general identity. Let p > 1 and suppose the result 
is true for s = p< m-—1. From 


p—l p-l m—p-1 p-1 
ITD VI+5° (W.C)| +} S> CF) [[amc) > =17 
j=l | t=p—j j=0 t=0 
we deduce that 
p-l p-1l m—p—2 p-l 
i Disra TI] mo) +] So ce} [[me) 


J=1 [t=p-5 


= 17 [T — DC™-?-! (WoC) --- (Wp-1C)] 
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and hence that 


p-l p-1 m—p—2 p-l1 
ITD VI+)> (W.C)| +} S> cf) [[]mc) > WC) = 17. 
j=l |t=p—j j=0 t=0 


If we rewrite this in the form 


Pp Pp m—p—2 Pp 
7D) | TT mo} +] So ce} [ [omc =17¢ 
j=1 | t=p+1-j j=0 t=0 


then it is clear that 


p Pp m—p—2 Pp 
I7DiI+>)| TT MoE +] S| | [[ me) 
j=l | t=p+1—j j=0 t=0 


=17C4+17D=1'. 


Hence the result is also true for s = p+ 1. This completes the proof. 
Lemma 6. The matrix W, exists for q = m,m-+1,...,k —m—1 and for 
each such q the JP identities of the second kind 


m-l1 q-1 
I7Dir+ >_>] [] MeC)| > =17 (15.3) 
=q-j 


j=1 |t j 


are also valid. 


Proof. The JP identities of the second kind are established in the same way 
that we established the JP identities of the first kind but care is needed 
because it is necessary to establish that each W, is well defined. From Lemma 
5 the JP identity of the first kind for p = 1 is 


m—3 
17D ¢ I+ |>_ C4] (WoC) > =1°. 
j=0 


Therefore 
m—-3 
i D4 t+ Ci| (WoC) $ +17 DC™ (WoC) =17 
j=0 
and hence 
m—3 : 
TDs It C3| (WoC) $} = 17 [TF — DC™2(WoC)), 
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from which we obtain 


m—-3 
17D ¢ I+ | 5° C4} (WoC) > (WiC) = 17C. 
j=0 


In general if we suppose that 


m—p p-2 
1 he C?| (WoC) (W.C) <17C? 
j=0 t=1 
then we have 
m—p-1 p—2 p—2 
I7DVI+| S> C*} (WoC) (WC) + (17C?-*)Do™? |] (WC) 
j=0 t=1 t=0 
enGe = 
and hence 


ITDV I+} S> C*| (WoC) > [][ (MC) 


m—p-1 p—2 
, a 


+ 


<1 OPT pom TT wc) 


t=0 
from which we obtain 
p-1 


m—p—1 
ITD < I+} S> C*} (Woc) > [[ (Mc) < 17% CP. 
j=0 


=1 


+ 


By continuing this process until Wo is eliminated we obtain the inequality 


m-1 
1D | Wwie) < 170", 


t=1 
Therefore the matrix 
Wm = [I — D(WiC)---(Wm—1C)|* 
is well defined. The JP identity of the first kind with p = m— 1 gives 


m—2 m—2 m—2 
ITD \I+)- (W,C)| + [] (mc) > =17 


j=l | t=m-1-j t=0 
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which, after rearrangement, becomes 


iD pe Tl (W,C) =1"[r-p T] mo) 


j=1 | t=m—1-j t=0 
and allows us to deduce 


m—2 m—2 
I7DLT+ >>| TL WC) ? (Wa-i0) =17C. 


g=l1 | t=m—-1-j 


If we rewrite this in the form 


m-1 m—-1 m-1 
17D, S> | TT moy| + [[ mc)? =17e 
j=1 |t=m—j t=0 
then it follows that 
m-1 m-1 
i? pD2T+ I] (me) > =1". 
j=l |t=m—j 


Thus the JP identity of the second kind is true for g = m. We proceed by 
induction. We suppose that the matrix W, is well defined for each s with 
m<s<q<k—m-—1 and that the JP identity of the second kind 


m-1 s—l 


I7D¢I+ >_ | IT] (MC)| > =17 


j=1 |t=s—j 
is valid for these values of s. Therefore 
=5 


m q-1 q-1 
I7DiI+S>] [[ Wo] pt+i7D JT] Moq=it 
=q-Jj 


j=l |t j t=q-—m+1 


and hence 


m—2 q-1 


(WC) |} = IT — DWy-m41€) + (Wo-10)] 


J=1 |t=a-3 
Since Wy = [I — D(Wg—m-+4iC)--:(Wa-1C)]~* is well defined, we have 


m—2 q-1 
Ppere (W.C)| > (W,C) =17C 


J=1 | t=a-3 
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which we rewrite as 
m-1 q 
iD I] (™o)| > =17e 
g=1 | t=+1-3 


and from which it follows that 


m-1 q 


I7DiI+>) | [J HC) > =17. 


j=l | t=qt+1—J 


Hence the JP identity of the second kind is valid for s = q+ 1. To show that 
W,+1 is well defined, we must consider two cases. When g < 2m — 2 we set 
p=q—m-+2 in the JP identity of the first kind to give 


qt+l-m qtl-—m 2m—q-1 j q+1l-m 
17D \I+ S- I] mol+| So cy) IT] mo) =1". 
j=l |t=q—m4+2-j j=0 t=0 
Therefore 
qtl-m qt+l-m 2m—q—2 qt+l-m 
ITD \I+ \ I] mo}l+}) SS) cy) TL mo) 
j=l | t=q—m4+2-j j=0 t=0 
= 17[T — DO?™-1! (WoC) +++ (Wo41—-mC)] 
and hence 
qtl-—m "Tl m 
(ay Be a es d, W,C) 
j t=q- Il —j 
2m—q—-2 q+1l-m 
+ = Ci) [J (WC) > Wy-m42€) = 17C. 
j t=0 


Since 17C < 1° it follows that 


qtl-m qt+l-m 


ITD \I+ > I] m™ 


t=q-—m+2-— 9 

2m—q-—3 qt+l1-—m 
+] do C8) TT 40) } (We-m420) 

j=0 t=0 


q-—m+2 
+(17C)DC?"-* JT (Wc) <1TC 


t=0 
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and hence that 


qtl-m qtl-m 


ITD\I+ > I] mo 


t=q-—m+2— 9 
2m—q-—3 qt+l1l—m 
+) So C8) TT MC) } (We-m420) 
j=0 t=0 


2 (Or po Oi WO 
Now we can deduce that 


qt+l-m qt+l-m 


i? Dd T+ S- II (WiC) 


t=q-—Mm+2—- J 
2m—q-3 qtl-m 
- ye O I (WiC) > (Wa—-m+2C)(Wg—-m+3C) < 17C?. 
j=0 t=0 


If we continue this process the terms of the second sum on the left-hand side 
will be eliminated after 2m — q steps, at which stage we have 


q+l—m q+1—m m1 
Le De ie. \ I] ™o [Woe ree 
j t=q-—m+2-j t=q-—m+2 


The details change slightly but we continue the elimination process. Since 
(17C?™—1) < 17 we now have 


q-m q+1—m m1 
ITD sI+>~ I] mo I] mo) 
J=1 | t=q-m+2-) t=q-—m+2 
m—-1 
Pi yD VGC) Se O28 
t=1 
and hence 
qt+1—-m m1 
I] mo I] mo 
= t=q-m+2-j t=q-—m+2 


< ee 1)(T — D(W,C)---(Wm_1C)], 


from which we obtain 
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q-m q+l—m m 
gwen araen I] mo Tl mare 
j=l | t=q—m+2-j t=q-—m+2 


The elimination continues in the same way until we eventually conclude that 


q 
1B: TT (VC)s 12C™ ar" 


t=q—m+2 


and hence establish that W,41 = [I — D(Wy—-m+2C)---(W,C)]~* is well 
defined. A similar but less complicated argument can be carried through 
using the appropriate JP identity of the second kind when gq > 2m-—1. Hence 
the matrix W, is well defined for s = q+ 1. This completes the proof. 


15.9 Summary 


We have established a general method of analysis for a class of simple con- 
trol policies in a system of two connected dams where we assume a stochastic 
supply and regular demand. We calculated steady-state probabilities for each 
particular policy within the class and hence determined the expected overflow 
from the system. A key finding is that calculation of the steady-state proba- 
bility vector for a large system can be reduced to a much smaller calculation 
using the block matrix structure. 

We hope to extend our considerations to more complex control policies 
in which the decision to pump from the first dam requires that the con- 
tent of the first dam exceeds a particular level m, and also that the con- 
tent of the second dam is less than the level mz = k — m ,. We observe 
that for this class the transition matrix can be written in block matrix 
form using the matrices A and B described in this article in almost the 
same form but with the final rows containing only one non-zero block ma- 
trix R. Thus it seems likely that the methodology presented in this chapter 
could be adapted to provide a general analysis for this new class of pumping 
policies. Ultimately we would like to extend our considerations to include 
more complicated connections and the delays associated with treatment of 
stormwater. 

We also believe a similar analysis is possible for the policies considered 
in this chapter when a continuous state space is used for the first dam. The 
matrices must be replaced by linear integral operators but the overall block 
structure remains the same. 
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Chapter 16 


Optimal design of linear 
consecutive—k—out—of-—n systems 


Malgorzata O’ Reilly 


Abstract A linear consecutive-k—out-of-n:F' system is an ordered sequence 
of n components that fails if and only if at least & consecutive components 
fail. A linear consecutive-k—out-—of—n:G system is an ordered sequence of n 
components that works if and only if at least k consecutive components work. 
This chapter establishes necessary conditions for the variant optimal design 
and procedures to improve designs not satisfying these conditions for linear 
consecutive systems with 2k <n < 3k. 


Key words: Linear consecutive-k—out—of-n:F system, linear consecutive— 
k-out—of-n:G system, variant optimal design, singular design, nonsingular 
design 


16.1 Introduction 


16.1.1 Mathematical model 


A linear consecutive-k—out-—of-n:F' system ([11], [20]-[23]) is a system of n 
components ordered in a line, such that the system fails if and only if at least 
k consecutive components fail. A linear consecutive-k—out—of—n:G system is 
a system of n components ordered in a line, such that the system works if and 
only if at least k consecutive components work. A particular arrangement of 
components in a system is referred to as a design and a design that maximizes 
system reliability is referred to as optimal. We assume the following: 
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the system is either in a failing or a working state; 

each component is either in a failing or a working state; 

the failures of the components are independent; 

component reliabilities are distinct and within the interval (0, 1). 


hon 


The fourth assumption is made for the clarity of presentation, without 
loss of generality. Cases that include reliabilities 0 and 1 can be viewed as 
limits of other cases. Some of the strict inequalities will become nonstrict 
when these cases are included. Note also that, in all procedures, an improved 
design X = (qi,.--,@n) and its reverse X = (dn,..-,q1) are considered to be 
equivalent. 


16.1.2 Applications and generalizations of linear 
consecutive—k—out—of—-n systems 


Two classic examples of consecutive—2—out—of-n:F' systems were given by 
Chiang and Niu in [11]: 


e a telecommunication system with n relay stations (satellites or ground 
stations) which fails when at least 2 consecutive stations fail; and 

e an oil pipeline system with n pump stations which fails when at least 2 
consecutive pump stations are down. 


Kuo, Zhang and Zuo [24] gave the following example of a linear consecu- 
tive-k—out—of-n:G system: 


e consider n parallel-parking spaces on a street, with each space being suit- 
able for one car. The problem is to find the probability that a bus, which 
takes 2 consecutive spaces, can park on this street. 


More examples of these systems are in [2, 19, 34, 35, 38]. For a review of 
the literature on consecutive-k—out—of—n systems the reader is referred to [8]. 
Also see [5] by Chang, Cui and Hwang. 

Introducing more general assumptions and considering system topology 
has led to some generalizations of consecutive-k—out—of-n systems. These 
are listed below: 


consecutively connected systems [32]; 

linearly connected systems [6, 7, 14]; 
consecutive-k-out-of-m-from—n: F' systems [36]; 
consecutive—weighed—k—out—of-n: F’ systems [37]; 
m-—consecutive-k-out-of-n: F' systems [15]; 
2-dimensional consecutive-k-out-of-n: F' systems [31]; 
connected—X—out-of-(m,n): F lattice systems [3]; 
connected—(r, s)-out-of-(m,n): F lattice systems [27]; 
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k-within—(r, s)—out-of-(m, n): F lattice systems [27]; 

consecutively connected systems with multi-state components [27]; 
generalized multi-state k-out-of-n: G systems [16]; 

combined k—-out-of-n: F, consecutive-k—out—of-—n: F' and linear connected— 
(r, s)-out-of-(m,n): F' system structures [39]. 


A number of related, more realistic systems have also been reported 
(1, 9, 30]. 

Linear consecutive-k—out—of-n: F' systems have been used to model vac- 
uum systems in accelerators [18], computer ring networks [17], systems from 
the field of integrated circuits [4], belt conveyors in open-cast mining [27] 
and the exploration of distant stars by spacecraft [10]. Applications of gener- 
alized consecutive systems include medical diagnosis [31], pattern detection 
[31], evaluation of furnace systems in the petro-chemical industry [39] and a 
shovel-truck system in an open mine [16]. 


16.1.3 Studies of consecutive—k—out—of-n systems 


Studies of the optimal designs of consecutive-k—out—of-n systems have re- 
sulted in establishing two types of optimal designs: invariant and variant 
designs. The optimality of invariant optimal designs is independent of the 
numerical values of components’ reliabilities and subject only to the ordering 
of the numerical values of component reliabilities. Conversely, the optimality 
of variant optimal designs is contingent on those numerical values. Malon 
[26] has noticed that, in practice, it may be sufficient to know the ages of 
the components to be able to order them according to their reliabilities. This 
has an important implication when an optimal design of the system is in- 
variant, that is, independent of the component reliabilities. For such optimal 
designs, one does not need to know the exact component reliabilities to be 
able to order components in an optimal way. A linear consecutive-k—out—of-— 
n:F system has an invariant optimal design only for k € {1,2,n—2,n—1,n} 
[26]. The invariant optimal design for linear consecutive—2—out-of-n:F' sys- 
tems has been given by Derman, Lieberman and Ross [12] and proven by 
Malon [25] and Du and Hwang [13]. The invariant optimal designs for lin- 
ear consecutive-k—out-of-n:F' systems with k € {n — 2,n — 1} have been 
established by Malon [26]. 

A linear consecutive-k—out—of-n:G system has an invariant optimal design 
only for k € {1,n —2,n—1,n} and for n/2 <k < n-—2 [40]. The invariant 
optimal design for linear consecutive-k—out-of-n:G systems with n/2<k< 
n—1 has been given by Kuo et al. [24]. Zuo and Kuo [40] have summarized the 
complete results on the invariant optimal designs of consecutive-k—out—of-n 
systems. Table 16.1 lists all invariant optimal designs of linear consecutive— 
k-out—of-n systems and has been reproduced from [40]. The assumed order 


310 M. O’Reilly 


Table 16.1 Invariant optimal designs of linear consecutive—k—out—of—n systems 


k F System G System 
k=1 w w 
k=2 (1,n,3,n—2,..., — 
n—3,4,n— 1,2) 
2<k<n/2 - - 
n/2<k<n-2 —- (1,3,...,2(n—k) —1,, 
2(n —k),...,4,2) 
k=n—-2 (1, 4, w, 3, 2) (1,3,...,2(n—k) — 1,0, 
2(n—k),...,4,2) 
k=n-1 (1, w, 2) (1,3,...,2(n—k) —1,, 
2(n —k),...,4,2) 
k=n w w 
of component reliabilities is py < po < ... < py. The symbol w represents 


any possible arrangement. 

In all cases where an invariant optimal design is not listed, only variant 
optimal designs exist. 

Linear consecutive-k—out—of-n systems have variant optimal designs for 
all F' systems with 2 < k < n—2 and all G systems with 2 < k < n/2. 
For these systems, the information about the order of component reliabili- 
ties is not sufficient to find the optimal design. In fact, one needs to know 
the exact value of component reliabilities. This is because different sets of 
component reliabilities produce different optimal designs, so that for a given 
linear consecutive—k—out—of-—n system there is more than one possible optimal 
design. 

Zuo and Kuo [40] have proposed methods for dealing with the variant opti- 
mal design problem which are based upon the following necessary conditions 
for optimal design, proved by Malon [26] for linear consecutive-k—out—of—n: F’ 
systems and extended by Kuo et al. [24] to linear consecutive-k—out—of-n:G 
systems: 


(i) components from positions 1 to min{k,n — &+ 1} are arranged in nonde- 
creasing order of component reliability; 
(ii) components from positions n to max{k,n —k-+ 1} are arranged in nonde- 
creasing order of component reliability; 
(iii) the (2k — n) most reliable components are arranged from positions (n — 
k +1) to & in any order if n < 2k. 


In the case when n > 2k, a useful concept has been that of singularity, 
which has been also applied in invariant optimal designs [13]. A design 
X = (1, 92,---;Qn) is singular if for symmetrical components q; and qn+1-i; 
1 <i < [n/2], either g; > dn4i—i Or Gi < Gn4i-i for all 7; otherwise the 
design is nonsingular. According to Shen and Zuo [33] a necessary condi- 
tion for the optimal design of a linear consecutive-k—out—of-—n:G system with 
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n € {2k,2k + 1} is for it to be singular. In [28] we have shown that a nec- 
essary condition for the optimal design of linear consecutive-k—out—of-n:F’ 
systems with 2k < n < (2k +1) is for it to be nonsingular. Procedures to 
improve designs not satisfying necessary conditions for the optimal design of 
linear consecutive-k—out—of-n:F' and linear consecutive-k—out—of-n:G were 
also given. The significance of these results was illustrated by an example 
showing that designs satisfying these necessary conditions can be better than 
designs satisfying other known necessary conditions. 


16.1.4 Summary of the results 


In this chapter we treat the case 2k +2 <n < 3k and explore whether the re- 
sults of Shen and Zuo [33] and O’Reilly [28] can be extended to this case. The 
proofs included here are more complicated and the produced results do not 
exactly mirror those when 2k <n < 2k +1. We find that, although the nec- 
essary conditions for the optimal design of linear consecutive-k—out—of-n:F 
systems in the cases 2k <n < 2k+1 and 2k+2 <n < 3k are similar, the pro- 
cedures to improve designs not satisfying this necessary condition differ in the 
choice of interchanged components. Furthermore, the necessary conditions for 
linear consecutive-k—out—of-n:G systems in these two cases are significantly 
different. In the case when 2k + 2 < n < 3k, the requirement for the opti- 
mal design of a linear consecutive-k—out—of—n:G system to be singular holds 
only under certain limitations. Examples of nonsingular and singular optimal 
designs are given. The theorems are built on three subsidiary propositions, 
which are given in Sections 16.2 and 16.4. Proposition 16.4.1 itself requires 
some supporting lemmas which are the substance of Section 16.3. The main 
results for this case are presented in Section 16.5. The ideas are related to 
those in the existing literature, though the detail is somewhat complicated. 
The arguments are constructive and based on the following. 

Suppose X = (q1,---,42k+m) is a design and {q;,,...,q;, } is an arbitrary 


e proper subset of {q1,..-,¢~} when m < 1, or 
e nonempty subset of {Gm,---,; de} when m > 1. 


We denote by X* = (qf,---,@r4m) the design obtained from X by inter- 
changing symmetrical components qj; and q2k+m+1-i,; for all 1 < jy < 1. 
We show that a number of inequalities exist between quantities defined 
from X and the corresponding quantities defined for a generic X*. We 
use the notation X* in this way throughout this chapter without further 
comment. 

Theorem 16.5.1 of Section 16.5 rules out only one type of design of consecu- 
tive-k-out-of-(2k +m): F systems: singular designs. However, we emphasize 
that the results for consecutive-k—out—of-(2k + m): G systems in Theorem 
16.5.2 and Corollary 16.5.2, obtained by symmetry from Theorem 16.5.1, 
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significantly reduce the number of designs to be considered in algorithms 
searching for an optimal design when m is small. For example, for m = 
2, when we apply the necessary condition stated in Corollary 16.5.2, the 
number of designs to be considered reduces from (2k — 2)! to (2k — 2)!/2* if 
(1, Qk+1; Uk+2; 2k+2) is singular (which occurs with probability 0.5 when a 
design is chosen randomly). We note that, except for the necessary conditions 
mentioned in Section 16.1.3, little is known about the variant optimal designs. 
We establish more necessary conditions for the variant optimal design in [29], 
also appearing in this volume, which is an important step forward in studying 
this difficult problem. 


16.2 Propositions for R and M 


Throughout this chapter we adopt the convention 


II qs = 1, 
) 
a+b—ab= (a@b), 
and make use of the following definitions. 


Definition 1. Let X = (q1,...,@2k4m), 2 <m< k,l <k and {q,,...,0,} 
be an arbitrary nonempty subset of {qm,..-, qx}. We define 


AX)= JI «@, 
s€{ii fis ir} 

AG) = II Qk+m+1—s) 
sE{i1,...,t7} 

B(X) = TT qs and 
s€{l,...,k}\{i1,...,47} 

By(X) = ial d2k+m+1-—s) 


8€{l,...,k}\{i1,..5¢r} 


with similar definitions for X* (obtained by replacing X with X* and 
q with q*). 


, / 


Note that B,(X) = B,(X*), B,(X) = B,(X*), A(X) = A(X*) and 
A (X*) = A(X). 


Definition 2. Let X = (q%,.--,;dak+m), 2 < m < k. Thus we have either 
m =2T +1 for some T > 0 or m = 2T 4+ 2 for some T > 0. We define 
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Wo(X) = F"*™(X), 


t 


Sere —— 
W(X) = Fek Gai aed Oiey Lystes sly Qe HPT pe oy ete 
t 
atin 
1,...,1, detm4i-+->Q2k+m) 


for 1<t<T,m>2 and 
m 
= ——“— 
Wriit(X) = BORO gs. -+5 9k; 1, sey 1, dktm-+1)- ya ,Q2k+m): 


Definition 3. Let X = (m,..-,@2k4m), 2 <m< k with either m = 2T +1 
for some T' > 0 or m = 27'4+ 2 for some T > 0. We define 


k 2k+m—-t 
M,(X) = ( iil ds O 1a s) 


s=t+1 s=k+m+4+1 
for O<t<T. 


Definition 4. Let X = (q1,..-,dak+m), 2<m<k. Ifm = 2T +2 for some 
T > 0, we define 


k 2k+m-—T-1 
Rr(X) = Pet 74+19k+m-T ( Il ds ® ii s) 
s=T+1 s=k+m+4+1 
2k+m—-T k 
+ Qk+T+1Pk+m—T ( Il ds ® a «) ; 
s=k+m+1 s=T+2 


If m > 2 with either m = 2T'+ 1 for some T > 0 or m = 2T + 2 for some 
T > 0, then for 0 <t < JT —1 we define 


k 
Ri(X) = pr+t41Uk-+m—t ( II ds 


s=t+l 
mm+k—3t—3 
Py a (apetiorscs eel Ohana ) 
2k+m—-t 
+ dk+t+1Pk+m—t Il Is@ 
s=k+m-4+1 


mm+k—3t—-3 
i oe i (dt+2; s 00) Wk, Ukt+t4+2,+-- a) : 


I 


It will be convenient to make the following abbreviations: A = A(X), A* 
A(X*), By = BX), B, = B,(X), W(X) = Wi, Wi(X*) = We, Me(X) = 
M,, M,(X*) = MF, Ri(X) = Ry and Ri(X*) = Re. 
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Propositions 16.2.1 and 16.2.2 below contain results for M; and Rr, which 
are later used to prove a result for Wo in Theorem 16.5.1 of Section 16.5. In 
the proofs of Propositions 16.2.1, 16.2.2 and 16.4.1 and Theorem 16.5.1 we 
assume qi > q2k+m- Note that, by the symmetry of the formulas, reversing 
the order of components in X and X* would not change the values of W;, MM; 
and R; for X and X*. Therefore the assumption gq, > qek4+m can be made 
without loss of generality. 


Proposition 16.2.1 Let X =(q,...,@2k+m) be singular, 2<m<k, with 
either m = 2T +1 for some T > 0 orm=2T +2 for some T > 0. Then 


M, > M; jor DVSCST, 
for any X*. 


Proof. Without loss of generality we can assume q; > qek+m- Note that 
m>2andsot+1<m for allO <t<T. We have 


M, = (AB ® AY Bias) 
Mj = (A*Bu @ ABi41), 
M, — Mf = (A— A*)(Bis1 — Bag), 
where A— A* > 0 and Bi41 — Beg > 0 by the singularity of X, and so 


M,— M; > 0, 


proving the proposition. 


Proposition 16.2.2 Let X =(q,.-.,@2k+m) be singular, 2<m<k, with 
m =2T +2 for some T > 0. Then 


Rr > Rp 
for any X*. 


Proof. Without loss of generality we can assume qi > q2e~+m- Note that 
m >2and so T+2<m. We have 


Rr = Pe+74+19k+m—-T (ari: ABry2 ® A*Br42) 


+ dk+T+1Pk+m—T (G2eim-7A* Br+o ® ABry2) F 


Rp = Pet+T419k4+m_-T (aryiA" Brs2® ABr42) 


+ Qk4T+1Pk+m—-T (G24m-TABr 42 ® A*Br42) 
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and 
fe 
Rr — Rp = qe+-741Prim-T(Bry2 — Gert+m—-TBry2)(A- A*) 
— Qetm—TPkt+ T+ (Bry — 9741 Br+2)(A— A”), 


where by the singularity of X 


Br+2—- Qk+m—-T B42 > 9, 
A-A*>0, 


Qk+T+1Pk+m—-T > Tk+m—TPk+T+1) 


/ y 
Bri2— Qkrtm-TB p42 > Brye - ar4i Bry, 


and so 
Rr — Rp > 0, 


proving the proposition. 


16.3 Preliminaries to the main proposition 


Lemmas 16.3.1—16.3.3 below contain some preliminary results which are used 
in the proof of a result for R, in Proposition 16.4.1 of Section 16.4. 


Lemma 16.3.1 Let X = (q1,..-,@2k+m) be singular, 2<m<k, with either 
m = 2T +1 for some T > 0 orm = 2T +2 for some T > 0. Then for any 
X* and all0 <t<T-—1 we have 


k m-1 
[a= 42. a (16.1) 
s=t+1 s=t+1 
mm+k—3t—3 
FA ale (QR ppt er Ghee AORN Ty pee eo) 
ns pee ) 
= Ly—2t-2 Dk+t+29 ++ +5 Ik+m—t—1) (2k4+t+2)+++>92k+m-—t-1 
2k+t+1 
I] « 
s=k+m+1 
_ A2(m—2t—-2) 
— “"m—2t—2 (k+t+2; soe Uk+m—t—-1) W2k+t+2)-+-+5 G2k-+m—t—1) : 
Qk+t+1 
is 
A*B,, TI as. (16.2) 
s=2k+2 
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k 
mam+k—3t—3 
Pee CaS reer | ee ae) ee ee mee) ae ee a II s 
s=t+1 
_ 2(m—2t-2) 
— #'m—2t—2 (dh4t425 tee Ukt+m—t—1) W2k4+t42)-+++5 d2k+m-—t—1) : 
m-1 Qk+t+1 
Ei 
AB AB) TL el ll aes (16.3) 
s=t+1 s=2k4+2 
k m-l1 
Ibe=| Ile) 43s. (16.4) 
s=t+1 s=t+l1 
2(m—2t—2) /_« x x x 
Prat (Gestta nere »Wkim—t—1) dki+m-—-1) aces, jsp) 
*> fralm—2t-2) 
— £'m—2t—2 (Qk+t+2; soe Uk+m—t—1) W2k+t+2)-+-+5 G2k-+m—t—1) 
Qk+t+1 
y 
AB, |[ 4s: (16.5) 
s=2k+2 
k 
m+k—3t—3/* * * * * 
ea (Gepepar: <-> Uktm—t—1 Uk+m-41) +++) Qk-+m—t-1) II Ws 
s=t+1 
_ 2(m—2t—2) 
— “m—2t—2 (Gk-+t425 see Uk4+m—t—1) W2k4t425-+-+5 Dein ea) : 
m-1 Qk+t+1 
/ 
ABA Be Y Doe) Tas (16.6) 
s=t+1 s=2k4+2 
and 
k 
m+k—3t—3 
sae (Gh+t+2; s+) Qk+m—t—-1)Uk+m41)-++- Ose tad) II ds 
s=t+l1 
k 
CS m+k—3t—3/* * * * * 
= Fey (Ge+t+2> 209 Ik+m—t—19 Uk+m419°+ +9 Qk+m—t-1) II qs: 
s=t+l 
(16.7) 


Proof. Without loss of generality we can assume q) > q2k+m- 

We have m > 2 and sot +1 < m for all0 <t < T—1. Therefore (16.1) 
is satisfied. Also, consider that in 

PRS Cap acy, +++) QUkt+m—t—1;4k+m4+15--- ,Q2k+m—t-1) 

we have 2(k —t—1) > m+k — 3t — 3. Therefore every event in which 
k —t — 1 consecutive components fail will include failure of the components 
dktm+1;+++)Q2k+t+1- Hence (16.2) follows. From (16.1) and (16.2) we have 
(16.3). 
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In a similar manner we show (16.4)—(16.6). 


From (16.3) and (16.6), we have (16.7) and the lemma follows. 


Lemma 16.3.2 Let X = (q,.. 


-;Q2k+m) be singular, 2<m<k, with either 


m =2T +1 for some T > 0 or m= 2T +2 for some T > 0. Then for any 


X* and all0 <t<T-—1 we have 


2k+m—-t 2k+m—-t 
Tl. a=43,. TT a (16.8) 
s=k+m+1 s=2k+2 
Eee 3 (gus, »++5 Qk, Wk+t+25--- »Uk+m—t-1) 
k 
2 2t—2 
= po? 2 ‘dese, e005 Im—t—1) VUk+t425+-+ + Uk+m-—t-1) ii ds 
s=m-t 
m-1 
v4 2t—2 
= = Poin? 2 Tceee sey Im—t—1) WUkt+t+25-++5 dk+m—t-1)ABm II ds 
s=m-t 
(16.9) 
2k+m—-t 
k—3t-—3 
Lee 1 (dt+2; +029 Gk, Uk+t+2>--- »Uk-+m—t—1) Il ds 
s=k+m+1 
(m—2t— *)( * 
= Fo (Mth2s- +s Im—t—1) Uhtt$25+ ++ Te+m—t-1) ABm A*B 
mai t 
: w) J i ds (16.10) 
s=2k+2 s=m—-t 
2k+m-—-t 2k+m—-t 
I] Nie (16.11) 
s=k+m+1 s=2k+2 
+k—3t—3/ x * * 
Fe t-1 (Qi+2, tees dks Tk+t+20°°° Opin 3 4) 
m-1 
2(m—2t—2 * 
= poe ‘Guo: +++) OUm—t—-1, Wk4+t4+25--+; dktm—t—1)A Bn II ds, 
s=m-t 
(16.12) 
2k+m—-t 
+k—3t—3/_* * * * * 
fale ( t+29°°°9 4k Wk+t+20°° »Te+m—t—1) II ds 
s=k+m-+1 
A2(m—2t—2 “ 
= fora Gis, ++) Um—t—1) Wk4+t425--+5 dk+m—t-1)ABmA B,, 


2k+m—t m-1 
( II s) ds 
s 


(16.13) 
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and 
2k+m—-t 
am+k—3t—3 
yep esoueiraatmad C28 enraere eat | earn emeere | wea er ik ds 
s=k+m-+1 
2k+m—t 
_— BFm+k—3t—3/_* * ok * *« 
= eT (CAS 2299 Oks Wk4t420-+ +> Oto a) al qs- 
s=k+m+1 


(16.14) 


Proof. Note that Lemma 16.3.1 is also true for designs reversed to X and 
X*, that is, for X, = (qok+m,---,q) and XF = (Gyimr--+>G)- Therefore 
all equalities of Lemma 16.3.2 are satisfied. 


Lemma 16.3.3 Let Y = (qoz,.-.,q1), k > 2, let (qz,...,q1) be singular with 
gq < qk, and let Y = (qok,---,Qk+1,91)---,9k)- Then 


Fe'(Y) > FEM’). 
Proof. For 1 < i < [k/2], let Y* be obtained from Y by interchanging com- 
ponents 7 and k+ 1 — 1. It is sufficient to show 

FRY) > FRO), 


that is, that the system Y is improved by interchanging the components at 
positions 7 and k+1—7. Note that by the singularity of Y we have q; < q%41-i. 

Malon in [26] (for F systems) and Kuo et al. in [24] (for G systems) have 
shown that if in a linear consecutive-k—out-of-n system we have p; > p; (or 
equivalently q; < q;) for some 1 <i <j < min{k,n—k+1}, then the system 
is improved by interchanging components p; and p; (equivalently gq; and q;). 
Applying this result to systems Y and Y*, we have 


Be) oee 


proving the lemma. 


16.4 The main proposition 


Proposition 16.4.1 below contains a result for R; which is later used in the 
proof of a result for Wo in Theorem 16.5.1 of Section 16.5. 


Proposition 16.4.1 Let X =(q,.-..,@2k+m) be singular, 2<m<k, with 
either m = 2T +1 for some T > 0 orm=2T +2 for some T > 0. Then 


Ri > R for 0<t<T-1 


16 Optimal design of linear consecutive-k—out—of-n systems 319 


for any X*. 


Proof. Without loss of generality we can assume q1 > q2k+m- Define 


k 
R,(X) = Peavetein-i( II ds 


s=t+l1 


mm+k—3t—3 
+ ET (Ge+t+2; +129 Qk+m—t—-1)Uk+m41s+++s a) 


2k+m—-t 


re astern Il ds 


s=k+m+1 


mm+k—3t—3 
r yay (G42; +05 Qk) Wk4+t42>-°- a) ; 


with a similar formula for R,(X*). Put R,(X) = R; and R,(X*) = R*. 
Note that in the formulas for R; and Rf the following equalities are satisfied: 
Gk+t+1 = Gatgi Uk+m—t = Gam—t> (16.7) of Lemma 16.3.1 and (16.14) of 


Lemma 16.3.2. Hence to prove that R; > Rj, it is sufficient to show ae Re. 
Define 


m—-1 2k+m—-t 
/ 
T= II a; LS ih ds; 
s=t+1 s=2k4+2 
_ -a2(m—2t—2) 
Uj = Foe (Gey dys + Oem L iy Gongeos + +s Gok d—4) and 
_ A2(m—2t—2) 
Up = pte 9 Gt4+25+++5QUm—t—15 Wk4t42)--+, dhim—taa) 


Applying results 


16.1), (16.2), (16.4) and (16.5) of Lemma 16.3.1, and 
(16.8), (16.9), (16.11 


and (16.13) of Lemma 16.3.2, we have 


ey oe eee 


2 2k+t+1 
Re = Pr+t+1dk-+m-—t (742, +A*B,,U1 |] :) 


s=2k+2 


m-1 
+ dk+t+1Pk+m-t (wer + AB,,U2 a «) ; 


s=m—-t 


2k+t+1 
Re = Pett+19k+m-—t (r42, + AB,,U1 II s) 
s=2k+2 


m-1 
+ dk+t+1Pk+m-t (12 + A*B,,U2 i s) ; 


s=m—-t 
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Re — Rt = aett41Peem-t (Bats Il ds — r'P, (A — A*) 


s=m-t 
Qk+t+1 
— Gk+m—tPk+t+41 (ain Tf qs — Pn) (A —A*). 
s=2k+2 


Note that by the singularity of X 


Ok+t+1Pk+m—t > Ik+m—tPk+t4+1; 


A-A*>0, 
BSB. 
T>T and 
m1 2k+t+2 
do [], sda, 
s=m-t s=2k4+2 


where the equalities include the assumption [[j qs = 1 , and 


m1 m—-1 m—t—1 
v TI «> (TI s) ile 
s=m-t s=m-t s=t+2 
m—1 
= II a 
s=t+2 
2k+m—t—1 
> | « 
s=2k+2 

a 


From (16.18) and (16.21) it follows that 


m1 
BnU2 [] a—-T By, >0. 
s=m-t 
Next, by Lemma 16.3.3 
Foe ot Gee sey Om—t—-1) Wk4+t42)+++5 Giene-F 1) 
> Fela rene ue +++ Um—t—-1) Uk+m-t-1)-++- + Tk+t+2)s 
and since 


Gt+2 > G2k+m—t—-1y+++>dm—t-1 > G2k+4t42, 


(16.15) 


16.16 
16.17 
16.18 
16.19 


Rae Se a, 


(16.20) 


(16.21) 


(16.22) 
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we have 
12 (m—2t—2) 
Pe t_2 (G42; +++) OUm—t—-1)WUk+m-t-1)-++ Uk-+t-+2) 
=2(m—2t—2) 
SF oe Oakes hoy COR Uhr ae) 


(16.23) 


From (16.22) and (16.23) it follows that 


m2(m—2t—2) 

Fn 22 (G42, +++) Om—t—-1) Wk4+t4+25---; dk+m—t—1) 
=2(m—2t—2) 

SE pao. VOI Oya 15, Dea ed fs en ha 


that is, 
Uz > Uj. 


From (16.18)-(16.20) and (16.24) we have 


m-1 2k+t+1 
BnU2 |] a—-TBy>BnUi |] a—TBm- 
s=m-t s=2k+2 


Considering (16.15), we conclude by (16.16), (16.17), (16.22) and (16.24) that 


R, > Rr, 


proving the proposition. 


16.5 Theorems 


Theorem 16.5.1 below states that if X is a singular design of a linear 
consecutive-k—out-of-(2k + m):F system with 2 < m < k, then for any 
nonsingular design X* obtained from X by interchanging symmetrical com- 
ponents (as defined in Section 16.1.4), X* is a better design. 


Theorem 16.5.1 Let X = (q1,.--,;ak+m) be singular, 2 << m <k, with 
either m = 2T +1 for some T > 0 orm=2T+4+2 for some T > 0. Then X* 
is nonsingular and 


Br (X) S POO) 
for any X*. 


Proof. Clearly, X* is nonsingular. Without loss of generality we can assume 
a1 > G2k4+m- Proceeding by induction, we shall prove that Wo > W. 

STEP 1. For 2<m=k we have Wry; = W7,, = 1. For 2<m<k we 
have 
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=2(k— 
Wr4i= BE a” (Ge teh 66s) Vey Uketmt1s+ ++) Q2k)s 


with a similar formula for W7,,. By Theorem 16.5.1 (see also O’Reilly [28]) it 
follows that Wr41 > Wy, with equality if and only if either {q@m41,---,q} 
is a subset of {q;,,-.-,q;,} or the intersection of those sets is empty. Either 
way we have 


Writ > Wr4t- 
STEP 2. Note that if m= 27T +1, then 


Wr = prer4iMr + deer4iWrsi, 


with a similar formula for Wz, where ge47+1 = 4741. We have Mr > M7 


by Proposition 16.2.1 and Wri, > W7,, by Step 1, and so it follows that 
Wr >We. 
If m = 2T + 2, then 


Wr = Prtr4+iPkeim—-TMr + de4+-7419k4+m—-TWrsi t+ Rr, 


with a similar formula for W7, where qx+741 = Ueyr4is Ukt+m-T = Ge4m_T> 


Mr > M7 by Proposition 16.2.1, Wr41 => Wp, by Step land Rr > Rp by 
Proposition 16.2.2. Hence Wr > W7. 
Either way we have 


Wr> Wr. 
If m = 2, then T = 0 and so Wo > Wj, completing the proof for m = 2. 


Consider m > 2. 

STEP 3. Suppose that W,,; > W;{,, for some 0 < ¢ < T’— 1. We shall 
show that then W; > W;. 

We have 


W. = Pr+t+1Pkt+m—tMi + Qett41dket+m—tWisi + Re, 
with a similar formula for W;*, where qx4it1 = Ggtipis Uktm—t = Gam—v 
M; > Mf by Proposition 16.2.1, Wi41 > W;,, by the inductive assumption 
and R, > Rf by Proposition 16.4.1. It follows that 
Wi > WY. 
From Steps 2-3 and mathematical induction we have 
Wo > Ws; 
proving the theorem. 


The following corollary is a direct consequence of Theorem 16.5.1. 
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Corollary 16.5.1 A necessary condition for the optimal design of a linear 
consecutive-k—out-of(2k + m):F system with 2 < m < k is for it to be 
nonsingular. 


Theorem 16.5.2 below states that if X is a singular design of a linear 
consecutive-k—out-of-(2k + m):G system with qi > qek+m, 2<m<k, then 
for any nonsingular design X* obtained from X by interchanging symmetrical 
components (as defined in Section 16.1.4), X is a better design. 


Theorem 16.5.2 Let X = (M,.--,@2k+m) be singular, 2 << m < k. Then 
X* is nonsingular and 


Gon (X) > Gree) 
for any X*. 


Proof. This theorem for G systems can be proved in a manner similar to the 
proof of Theorem 16.5.1 for F systems. That is, by giving similar definitions 
for G systems, similar proofs for lemmas and propositions for G systems 
can be given. Alternatively, the theorem can be proved by applying only 
Theorem 16.5.1, as below. 

Clearly, X* is nonsingular. Define p; = q; for all 1 <i < 2k+m. Then we 
have 


Gee) = Br prs we ,P2k+m); 
Gt (X*) = FEET" (Bh, Pant): 


where (p1,...,P2k4+m) is singular, and so by Theorem 16.5.1 


m2k+ = = m2k —* =k 
E Diss ee Doetan) Si, add (cee em 
proving the theorem. 


Corollary 16.5.2 Let Y = (q1,..-,Gak+m) be the optimal design of a linear 
consecutive-k—out—of(2k + m):G system with 2<m<k. If 


(Qty ++ Qm—1s Tht +++) Uk+ms W2k+2)+++ 1 Qk+m) 
is singular, then Y must be singular too. 


Proof. Suppose Y is not singular. Let Z be a singular design obtained from Y 
by an operation in which we allow the interchange of only those symmetrical 
components which are in places m,...,k,k+m+1,...,2k+1. Then Z and 
Y satisfy the conditions of Theorem 16.5.2, and so 


Garr) S Gry), 


Hence Y is not optimal, and by contradiction the corollary is proved. 
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16.6 Procedures to improve designs not satisfying 
necessary conditions for the optimal design 


We have shown that a necessary condition for the optimal design of a lin- 
ear consecutive-k—out-—of-n:F' system with 2k + 2 < n < 3k is for it to be 
nonsingular (Corollary 16.5.1 of Section 16.5), which is similar to the case 
2k <n < 2k+1 treated in [28]. However, the procedures given in [28] can- 
not be implemented in this case. This is due to the restriction placed on 
the choice of interchanged symmetrical components ((3m — 2) components 
excluded from the interchange). 
The following procedure is a consequence of Theorem 16.5.1. 


Procedure 16.6.1 Jn order to improve a singular design of a linear consec- 
utive-k—out-of-(2k+m):F system with 2<m<k, apply the following steps: 


1. select an arbitrary nonempty set of pairs of symmetrical components so 
that the first component in each pair is in a position from m to k; and 
then 

2. interchange the two components in each selected pair. 


Note that the number of possible choices in Step 1 is 24-™+)) — 1. Conse- 
quently, the best improvement can be chosen or, if the number of possible 
choices is too large to consider all options, the procedure can be repeated as 
required. 

Because the result for systems with 2k+2 <n < 3k excludes some compo- 
nents, it is not possible to derive from it, unlike the case when 2k < n < 2k+1, 
that it is necessary for the optimal design of a linear consecutive—k—out—of-n: 
G system to be singular. However, as stated in Corollary 16.5.2 of Section 
16.5, if a subsystem composed of those excluded components is singular, then 
the whole system has to be singular for it to be optimal. Consequently, the 
following procedure can be applied. Note that, for a given nonsingular design, 
the number of possible singular designs produced in this manner is 1. 


Procedure 16.6.2 Suppose a design of a linear consecutive-k—out—of-(2k + 
m): G system is nonsingular, with 2<m<k. Consider its subsystem com- 
posed of components in positions from 1 to (m—1) , from (k+1) to (k+m), 
and from (2k + 2) to (2k +m), in order as in the design. If such a subsys- 
tem is singular, then in order to improve the design, interchange all required 
symmetrical components so that the design becomes singular. 


The following examples, calculated using a program written in C tT, are 
given in order to illustrate the fact that both nonsingular and singular optimal 
designs of linear consecutive-k—out—of-n:G systems exist. 


Example 1. (q1, 95,97, 99; 98; 96; 94; 93, Q2) iS a nonsingular optimal design of 
a linear consecutive-3—out—of-9:G system. It is optimal for q,; = 0.151860, 
q2 = 0.212439, q3 = 0.304657, qa = 0.337662, gs = 0.387477, gg = 0.600855, 
q7 = 0.608716, gg = 0.643610 and gg = 0.885895. 
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Example 2. (qi, 93; 94; 95; 97> 99; 98; 6; 92) is a singular optimal design of a lin- 
ear consecutive-3—out—of-9:G system. It is optimal for gq; = 0.0155828, qo = 
0.1593690, q3 = 0.3186930, q4 = 0.3533360, gs = 0.3964650, ge = 0.4465830, 
gr = 0.5840900, gg = 0.8404850 and qo = 0.8864280. 
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Chapter 17 


The (k + 1)-th component of linear 
consecutive—k—out—of-—n systems 


Malgorzata O’ Reilly 


Abstract A linear consecutive-k—out-of-n:F' system is an ordered sequence 
of n components that fails if and only if at least & consecutive components 
fail. A linear consecutive-k—out—of—n:G system is an ordered sequence of n 
components that works if and only if at least k consecutive components work. 

The existing necessary conditions for the optimal design of systems with 
2k < n provide comparisons between reliabilities of components restricted 
to positions from 1 to k and positions from n to (n — k + 1). This chapter 
establishes necessary conditions for the variant optimal design that involve 
components at some other positions, including component (k+1). Procedures 
to improve designs not satisfying those conditions are also given. 


Key words: Linear consecutive-k—out—of-n: F’ system, linear consecutive— 
k-out-of-n: G system, variant optimal design, singular design, nonsingular 
design 


17.1 Introduction 


For the description of the mathematical model of the system discussed here, 
including nomenclature, assumptions and notation, the reader is referred to 
[9], also appearing in this volume. 

Zuo and Kuo [16] have proposed three methods for dealing with the variant 
optimal design problem: a heuristic method, a randomization method and a 
binary search method. The heuristic and randomization methods produce 
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suboptimal designs of consecutive-k—out—of-n systems; the binary search 
method produces an exact optimal design. 

The heuristic method [16] is based on the concept of Birnbaum impor- 
tance, which was introduced by Birnbaum in [1]. The Birnbaum reliability 
importance I; of component 7 is defined by the following formula: 


I; — Bipis +++) Di-1; 1, Di41, es ;Pn) a R(p1, ria »Pi-1,9, Pi+1, aes Dn), 


where R stands for the reliability of a system. 

The heuristic method [16] implements the idea that a component with a 
higher reliability should be placed in a position with a higher Birnbaum im- 
portance. Based on Birnbaum’s definition, Papastavridis [10] and Kuo, Zhang 
and Zuo [5] defined component reliability functions for consecutive-k—out— 
of-n systems. Zakaria, David and Kuo [13], Zuo [14] and Chang, Cui and 
Hwang [3] have established some comparisons of Birnbaum importance in 
consecutive-k—out—of—n systems. Zakaria et al. [13] noted that more reli- 
able components should be placed in positions with higher importance in a 
reasonable heuristic for maximizing system reliability. Zuo and Shen [17] de- 
veloped a heuristic method which performs better than the heuristic method 
of Zuo and Kuo [16]. 

The randomization method [16] compares a limited number of randomly 
chosen designs and obtains the best amongst them. The binary search 
method [16] has been applied only to linear consecutive-k—out—of—n:F' sys- 
tems with n/2 < k < n. Both methods are based upon the following necessary 
conditions for optimal design, proved by Malon [7] for linear consecutive-k— 
out-of-n:F systems and extended by Kuo et ail. [5] to linear consecutive-k— 
out—of-n:G systems: 


(i) components from positions 1 to min{k,n —k&+ 1} are arranged in nonde- 
creasing order of component reliability; 
(ii) components from positions n to max{k,n—k+ 1} are arranged in nonde- 
creasing order of component reliability; 
(iii) the (24 — n) most reliable components are arranged from positions (n — 
k +1) to & in any order if n < 2k. 


Pairwise rearrangement of components in a system has been suggested 
as another method to enhance designs [2, 4, 6]. Other necessary condi- 
tions have also been reported in the literature. Shen and Zuo [12] proved 
that a necessary condition for the optimal design of a linear consecutive— 
k-out-of-n:G system with n € {2k,2k + 1} is for it to be singular and 
O’Reilly proved that a necessary condition for the optimal design of a lin- 
ear consecutive-k—out-of-n:F' system with n € {2k,2k + 1} is for it to be 
nonsingular [8]. Those results have been extended to the case 2k <n < 3k 
by O’Reilly in [9]. Procedures to improve designs not satisfying those neces- 
sary conditions have been also provided ([8], Procedures 1-2; [9], Procedures 
1-2). 
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17.2 Summary of the results 


In this chapter we focus on variant optimal designs of linear consecutive—k— 
out—of-n systems and establish necessary conditions for the optimal designs 
of those systems. As an application of these results, we construct procedures 
to enhance designs not satisfying these necessary conditions. An improved 
design and its reverse, that is, a design with components in reverse order, are 
regarded in these procedures as equivalent. Although variant optimal designs 
depend upon the particular choices of component reliabilities, the necessary 
conditions for the optimal design of linear consecutive systems established 
here rely only on the order of component reliabilities and not their exact 
values. Therefore they can be applied in the process of eliminating nonop- 
timal designs from the set of potential optimal designs when it is possible 
to compare component reliabilities, without necessarily knowing their exact 
values. 

We explore the case n > 2k. The case n < 2k for G systems has been 
solved by Kuo et al. [5] and the case n < 2k for F systems can be limited to 
n = 2k due to the result of Malon [7]. We can summarize this as follows. 


Theorem 17.2.1 A design X = (q1,q2,---;Qn) is optimal for a linear 
consecutive-k—out—of-n:F system with n < 2k if and only if 


1. the (2k—n) best components are placed from positions (n—k+1) tok, in 
any order; and 

2. the design (q1,---;Qn—ksQk+1,-++>Qn) ts optimal for a linear consecu- 
tive-(n — k)-out-of-2(n — k):F system. 


The existing necessary conditions for the optimal design of systems with 
n > 2k [5, 7] provide comparisons between the reliabilities of components 
restricted to the positions from 1 to k and the positions from n to (n—k+1). 
In this chapter we develop necessary conditions for the optimal design of 
systems with n > 2k with comparisons that involve components at some 
other positions, including the (&+1)-th component. The following conditions 
are established as necessary for the design of a linear consecutive system to 
be optimal (stated in Corollaries 17.3.1, 17.4.1 and 17.5.1 respectively): 


© am > Gri and dn > dn—x for linear consecutive-k—-out-of-n:F' and 
consecutive-k—out—of-n:G systems with n > 2k, k > 2; 

e minfgi, gx} > de+1 > max{qe,qe+2} for linear consecutive—k—out—of- 
n:F systems with n = 2k +1, k > 2; 

© (41,941,942; 92k42) is singular and (q1,...,4%,%%+3,---,G2k+2) 
nonsingular for linear consecutive—k—out—of-n:F' systems with n = 2k +2, 
k > 2. 


Further, procedures to improve designs not satisfying these conditions are 
given. Whereas the first of the conditions is general, the other two conditions 
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compare components in places other than only the & left-hand or k right- 
hand places for systems with n > 2k, unlike what has been considered in 
the literature so far. Zuo [15] proved that for linear consecutive-k—out-of-n 
systems with components with common choices of reliability p and n > 2k, 
k > 2, we have 


I< Td, 


where J; stands for Birnbaum reliability importance. Lemmas 17.3.1—17.3.2 
and Corollary 17.3.1 give a stronger result, which also allows the component 
unreliabilities to be distinct. 

Suppose X is a design of a linear consecutive system with n > 2k. Let i 
and j be the intermediate components with k <i <j <n—k+1. From the 
results of Koutras, Papadopoulos and Papastavridis [4] it follows that such 
components are incomparable in a sense that the information q; > q; is not 
sufficient for us to establish whether pairwise rearrangement of components 
i and 7 improves the design. However, as we show in this chapter, this does 
not necessarily mean that we cannot determine, as a necessary condition, 
which of the components 7 and j should be more reliable for the design to be 
optimal. In the proofs of Propositions 17.4.2 and 17.5.2 we apply the following 
recursive formula of Shanthikumar [11]: 


Fr(a, sre +n) = pote Cree a »In—-1) + Pn—kQn—k41 +++ On 
at [1 = Fea, ona ,Qn—k-1)] . 


17.3 General result for n > 2k, k > 2 


We shall make use of the following notation. 


Definition 1. Let X = (m,..-,dn)(X = (p1,---,pn)). We define X* to be 
a design obtained from X by interchanging components i and j. 


Propositions 17.3.1 and 17.3.2 below contain preliminary results to Lem- 
mas 17.3.1 and 17.3.2, followed by Corollary 17.3.1 which states a necessary 
condition for the optimal design of linear consecutive-k—out—of-n:F’ and lin- 
ear consecutive-k—out—of—n:G systems with n > 2k, k > 2. 


Proposition 17.3.1 Let X =(qm,...,dn), 2 > 2k, k > 2. Then 


FR (X)—FR (XE) San4o(qeg1 — o) FR (a2, -- 5 des 1, 1, G43, +++ In) 
_ Fe (1 Ga5on25Ges Oy 1, Ge bsycs “3Gin)| : 


Proof. By the theorem of total probability, conditioning on the behavior of 
the items in positions 1 and k + 1, we have that 
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FR(X) = gaps FR (1, G2, ++ +5 Ue 1s Uet25+++3%n) 

+ pipes FR (0, G25 +++ 54610, Ge-+2) +++ In) 

+ (pigngr)Oe+oF ye *(d2,+-+5 Gk 1,1, e435 +++4 En 
(019641) Peale *(day+++ 5G 1,0, Ge435- 095 On 

+ (q1Pe+1)Gk+2F PR (1, d2,---59b,0,1, e435 -6+5 On 
(q1Pk+1)PRo2FE (1, G2, --+59%,0,0, Gk4+35-++59n); 


and so also 
FR(XUE+H) = guaran FR (1, dan. +1 Ghr 1, Gea, +++ On) 
+ pro pi lR (0, G2, ++ +540; db425 +++ In) 
+ (proig)artoP ey *(g2,---5 Gk 1,1, Ge435 +++ In) 
( 1)Pk “one "(a5 ++ +5 Qk, 1,0, db43,+++54n) 
+ (Ge41P1)Qe¢2F RP (1, das +- ++ 9,051, de4+35-+++9n) 
(qk-+1P1)Pk+2FK (1, G2, -+ +5 Qk 0, 0, dk+3,-++5In): 


Note that 


FEO: 5 ele Oth On) 
= FP(1,@,---,9,9,0, de+3,---, Qn) 


k k 
= [Jost FE * (gases. -1dn) - (11 :) Be (apg aesss 
s=2 


s=2 


and therefore 

(Dide41) Peel *(d25--+5 Oks 1,0, de435 +++ In) 
+ (qiprti)PR+oFR (1, das +--+ Gk, 0,0, Gk435 +++ 5 Gn) 
= (pe4i9i)Pr+ol ye *(q2,---; Qe, 1,0, de+3,-++5 Gn) 
+ (4e41P1)PepoFR (1, do, ++ +5 %10,0, Ge43,+--19n): 


Consequently, from (17.1), (17.2) and (17.4) we have 
Benes Hn) 
= [(pidesi)dE+oFe "Raia Gay Us My Gepaseney a) 


mae ((1Pk+1 ) dk soe (1, Cisse ths OiNy Geptys004 Gn) 
ad [(pe+ig)@ Prey ota “dont sng Gey Ws Lj Gesayk 4G) 
“ds (qk-+1P1)Gk+2F (1 G2,--+5 4k; 0, is Gk+3)--- +In)| 


= Gk+2(Qk+1 a q1) [FR * (a2, »++5dk; 1, 1, dk-+43; es Qn) 
a FE yas 25 OA aha es eecn dn) |s 


proving the proposition. 
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(17.1) 


(17.2) 


sate, 


(17.3) 


(17.4) 


(17.5) 
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Proposition 17.3.2 Let X = (pi,...,pn), n> 2k, k > 2. Then 


G(X) — GR(X**) = Pr+2(Pe+i — Pr): 
[GR * (pa, +++5Dk> 1, 1, Pets; anes ,Pn) = 
Gil, po, +++ Dk, 0, 1, Pets; te Pn)| : 


Proof. This result follows from Proposition 17.3.1 and the fact that 


FR (a, oe 304) <a Ge(pi, cee Pn); 


where pj =qi,l<i<n. 


Lemma 17.3.1 Let X = (q1,..-,;dn) be a design for a linear consecutive— 
k-out-of-n:F system, n > 2k, k > 2. If aq. < qpai, then X'*+1 is a better 
design. 


Proof. We assume the notation that if T < R then 


Define 


We= ram ree Gee lg lhedbieant ae) 
— FR(1, g2,--+5 de, 0,1, dk435+-+5 Un): (17.6) 
Since qx41 — q1 > 0 by assumption, it is sufficient by Proposition 17.3.1 to 
show that W > 0. 


We shall show W > 0. Define W; for 0 <7 < k—1 in the following way. If 
i =0, then 


W; = FP-*(1,1, qn43,--+>@n) — F®-*(0,1, ¢n43,---19n)3 
if1 << k= 2, then 
Wee Fer OP, cele Lil apd d conn gn) 
S$ 


a 


= ‘gees seoeg 1,0, 1, de+s, ws 3 Qn)i 


and if i= k —1, then 
Te 2 aaa i erg 0 sls 28) 
Nee 
k-1 
mn—k+i 
PROPS Gooch Olaessyengan (17.7) 
—— 


k-1 
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Since 
k 


k-2 k 
nr So ll w)+Tn=2 
i=l s=2 


s=k—i+1 


by the theorem of total probability, conditioning on the behavior of the items 
in positions 2,...,k, we have that 


k-2 k k 
W = prWo + aS (m- sa «) Wit (1 s) Wr-1- (17.8) 


w=1 s=k—-i+1 s=2 


Note that Wz; = 1—1=0 and W; > 0 for all 0 <i < k—2. From this it 
follows that W > 0, proving the lemma. 


Lemma 17.3.2 Let X = (pi,...,pn) be a design for a linear consecutive— 
k-out-of-n:G system, n > 2k, k > 2. If a < qp4i, then X'**1is a better 
design. 


Proof. By reasoning similar to that in the proof of Lemma 17.3.1 it can be 
shown that W > 0, where W is defined by 


Ww = GP" (po, 2+; Dey 1,1, D643) +++) Pn) 
_ Gy Ch pasexe; Dp, 0,1, peas Da): (LE2) 


Since pr41 — pi < 0 by assumption, from Proposition 17.3.2 we have 


G(X) Ge"), 


proving that X}**1 is a better design. 


Corollary 17.3.1 Let X = (q1,..-,@n), n > 2k, k > 2. If X is an optimal 
design for a linear consecutive-k—out-of-n:F or k-out-of-n:G system, then 
Gi > dr+1 and qn > In—k: 


Proof. Let X be an optimal design for a linear consecutive-k—out-of-n: F 
system. Suppose that qi < gr41 OF Gn < Qn—k- If G < qdegi, then from 
Lemma 17.3.1 it follows that X1*+1 is a better design. Further, if gn < dn—k, 
then by Lemma 17.3.1 applied to the reversed design X; = (dn,---,;q1), we 
have that X""—* is a better design. 

The proof for a linear consecutive-k—-out—of-n:G system is similar and 
follows from Lemma 17.3.2. 


Note: In the optimal design of linear consecutive-k—out—of—n systems 
with n € {2k +1,2k+ 2}, the worst component must be placed in position 1 
or n. This is due to the necessary condition for the optimal design stated in 
Corollary 17.3.1 and the necessary conditions of Malon [7] and Kuo et al. [5], 
as stated in Section 1. Considering that a design is equivalent to its reversed 
version in the sense that their reliabilities are equal, it can be assumed in 
algorithms that the worst component is placed in position 1. 
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17.4 Results for n = 2k+1,k > 2 


We shall make use of the following notation. 


Definition 2. Let X = (qm,...,dn) (X = (p1,.--,pn)). We define 


to be a design obtained from X by interchanging components 7, and j, for 
all<s<r. 


Propositions 17.4.1 and 17.4.2 below contain preliminary results to Lemma 
17.4.1, followed by Corollary 17.4.1 which states a necessary condition for the 
optimal design of a linear consecutive-k—out—of-(2k+1):F system with k > 2. 


Proposition 17.4.1 Let X =(qm,...,@on+41), k > 2. Then 
FeR+1(X) — FeRtT OX BRAT) — (@, — guia) (G1 ---Gk—1 — Geto -- + Gar 
+b42 +++ G2k41 — U1 +++ Ue—-19k4+2 ++» G2k41) « 


Proof. From 


PPM (X) = aedeea le (Gis os, Gey 1, Lge; 22 danda) 
+ pees PEP (G15. +5 Gk—-1,0, 0, de+25 +++ 5 Gak41) 
+ pager let (qi... +, Ge—1,05 1, Gh+2, +++ Gok+1) 
+ gepesi Fee (qi, -- +5 G&—1) 1,0, de2, +++ Qak+1)s (17.10) 


and consequently 


PE XRD) Sage eGR (Gia ine SOR AY Ls le Ga teysn-d aE a) 
+ pep ipeE et (qr, .--,Qe-1,0,0, de-+2,+--,Gak+1) 
+ pepigel et (qi, .--;Qk—1;0; 1, Ge425 +++ Q2k-+1) 


+ qeeipele*t)(qi,.--,Qk—1) 1,0, Qe425-+-sQ2k-+1)s (17.11) 
we have 


Fek+1(X) _ Berl ee eee) 
= (dn — Ge41) [Pe ais «46 Qep—ty.1,0, devo) <5 Gane) 
SF Giess <5 gn 0,0, Logue Gar+1)| 

= (dk — Gk-1) (M+ Gk-1 — Gh 2+ ++ Gk 

+ Gk42++-GQok+1 — UN -++-Uk—-19k42 +++ G2k+1)5 (17.12) 


proving the proposition. 
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Proposition 17.4.2 Let X = (q1,...q2n41) and Y = Xbvok-h2k,k+2, 
Then 


FRE+1(X) — FREVY(Y) = (qna2--- oe — 1 --- Gk—-1)" 


(Ge+1P2k-+1 + G2k+1 — Uk): 


Proof. From Shanthikumar’s recursive algorithm [11], as stated in Section 
17.2, it follows that 


i kes ©. 8) = FRR(q,..-, dar) + Petidete ++» Gakqarti (1 — qi... Gr—19k) 
(17.13) 


and 

Feeriy) _ FR* (aor, 2009 Ok+25 Ck) Uk+1)dk-15--- 11) 

+ Pr+idk—1--- 1 9an+1 (1 — dh42--- G2eI) - (17.14) 
Note that 


Pk+1Gk+2 ++» Q2kQ2k+1 (1 — G1 --- Ge—-19k) 
— Pr4ide—-1++- G1 g2r41 (1 — dete --- G2rdn) 
= Pr4id2k+1 (k+2 +++ dak — U1++-Mé-1)- (17.15) 


Also, we have 


FR*(qi,-.-,Q2k) = PkPk+1- 0 


=2(k—2 
rT anges es (aa, +++) Qk-1),k+25--- ,Q2k—-1) 
+ Prde+1(de+2 ++» Gor) + UePRtUN «++ Oe—1) (17.16) 
and 
FRE (Gok, + «+s Ue-+2 Qk Ue+1s Uk—1) +++ M1) = PePkti - 0 
=2(k—2 
+ angnsi Fe (ag, wey Ue—1y Ukt29+ ++) Q2k—-1) 
+ PrOk+1(d1 ++» Ge—1) + Pe4idk(Gk-+2 +++ G2k); (17.17) 
and so 


Fe*(a, tee (2k) 7 FR* (gor, +205 Qk+2, 4k, Uk+1)4k-1)--- M1) 
= (de41 — Oe) (Qk42--- Gok — 91+. Ge—1) - (17.18) 


From (17.13)-(17.15) and (17.18) it follows that 
Hex) Ew) 
=> (dk-+1 = dU) (dk-+2 +++G2k— Q1--- Wk—1) 
+ Pr+iger+1 (Ge4+2---G2k — G1 ---Ue-1) 
= (qh+2---Q2k — N+» + Qk—-1) (Gk+1P2k+1 + Gak+1 — Wk); (17.19) 


and so the proposition follows. 
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Lemma 17.4.1 Let X = (q,...qar+1) be a design for a linear consecutive— 
k-out-of-(2k +1) : F system, k > 2. Let X satisfy the necessary condi- 
tions for the optimal design given by Malon [7] and Kuo et al. [5], as stated 
in Section 1. Assume qdr+i < dn. If 


G1 - ++ Gk-1 2 Wk+2--- Qk, (17.20) 
then X®*+1 is a better design, while if 
G1 +++ Qk-1 < Gk+2---Q2k, (17.21) 


then Xbook-L2k,k+2 is @ better design and (qor+1,%,---Qak) is a better 
design. 


Proof. Suppose qi ..-Gk—-1 => Qk+2--- Gar. Then, since gx — dk+1 > 0 and 
Qk+2+++Q2k+1 — W+++Qk-19k+2---Gar+1 > 0, 
by Proposition 17.4.1 we have 
frend ©, 9 ee am, Gata oa Ik (17.22) 


and so X**+! is a better design. 

Suppose qi..-dk-1 < k+2---Go~. We have assumed the values q; are 
distinct, so gor41 # Qe Hf dors+1 < dp, then by the necessary conditions of 
Malon [7] and Kuo et al. [5] we have 


a> >: > dk > Gert > Gar > +++ > W+2; (17.23) 
and then q1...dk—1 > Qk+2--- 2k, contrary to assumption. Hence 
G2k+1 > Ue (17.24) 
and so by Proposition 17.4.2 we have 
ae 6) _ FOR Xd 1 2h, +2) >0, (17.25) 


proving that 


Xb B OBB BA? (Gog. 6, Gets Ves Uet1s Wes +++ 9 Ms Q2k41) 

is a better design. Define 5 a Giacsad = X11. k-1)2k,...k+2 Note 
that Gus Cp and a, eee Op 4 > eas hele de and so by Proposition 17.4.1, 
as we have shown in the earlier part of this proof, interchanging components 
%+41 and q, € X improves the design. Since a design 


a ih , , , 


(Cee Gk—1) Ue41 Uk Wk420++° ,fon41) = (qak,--+, M15 G2k41) 
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is better than Lae is better than X, and 


FRE+1 (gost, Gi,---Q2k) = BoP (dsp; 6s 1s Qak+1)5 (17.26) 


SO (G2k4+1,41;--- Gx) is better than X. 


Note that the rearrangement X 1-087 1:2h,...k+ 


above is equivalent to: 


2 as given in Lemma 17.4.1 


e taking the (2k + 1)-th component and putting it on the left-hand side of 
the system, next to the first component (in position 0); 

e interchanging components & and (& + 1); and then 

e reversing the order of components. 


Corollary 17.4.1 Let X = (q1,..-dor41) be a design for a linear conse- 
cutive-k—out-of-(2k +1): F system, k > 2. If X is optimal, then 


min{q1, gaeti} > det1 > max{qn, Uk+2}- (17.27) 


Proof. From Corollary 17.3.1 we have min{q1,qor+1} > de41- If X is opti- 
mal, then it satisfies the necessary conditions for the optimal design given by 
Malon [7] and Kuo et al. [5], as stated in Section 17.2. From Lemma 17.4.1 it 
follows that qri1 > qe must be satisfied. Similarly, from Lemma 
17.4.1 applied to the reversed design X, = (qar+1,---,@1); we have qx41 > 
dk+2- 


17.5 Results for n = 2k+2,k > 2 


Propositions 17.5.1 and 17.5.2 below contain preliminary results for Lemma 
17.5.1, followed by Corollary 17.5.1 which gives a necessary condition for the 
optimal design of a linear consecutive-k—out—of-(2k + 2): F’ system. 


Proposition 17.5.1 Let X =(qm,...,@on+2), k > 2. Then 
Faas @.9) o>. Pee ire) = (de-+2 = dk-+1): 
[(Por4+29k43 +++ G2k+1 — P1d2--- dk) 
=e, (por-+2 TS P1)q2 +++ OkIk+3 ++ don+1] . 


Proof. Since 


~ 2h =2(k—2 

BO) = airs M6) 06+ Qes G4, +s Gap) 
T Per iPe+oF et? (qn, +105 Qk» 0, 0, Wk+35-++° , Qk) 

+ pepigerol et? (ar, . +, G0, 1, Ge+3, +++» 2k-+2) 


a de 1Prpok pe? (Hn, +249 dks 1, 0, Gk+35++ +5 d2k-+2) (17.28) 
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and 

m2k +1; m2(k—-2 

FORTS) = ghar oi CEE ee ee) 

a ProiPesoe et? (qn, +625 dk, 0, 0, Gk+35+-- (2k) 

may Prpode+ leet (qy, +++5 dk, 0, 1, Tk+35++- »Q2h+42) 

+ QepoPepi Feet? (Gr, «+, Qi 1,0, G43, +++» Qk-+2)s (17.29) 
we have 


FEN? (X) — BRP (XB) = (gap — get): 

[FeRtP (a1, .-- 5G, 0, 1, de+3,-+ +s 42k-+2) 

_ Bg: ys Ok dy OF ORS yo 5 dok-+2)| 

= (qk+2 — Qe41) [G1 --- Gk + k43 +++ Qak+1 — M+ + OkGk4+3 +++ G2k4+1 

= 42-6. Uk — Uh43 ++ Qak+2 + 2+ ++ OkUk4+3 ++ Gk+2! 

= (qk+2 — de+1) [(Pek+29k43 +++ Gak+1 — P1g2--- dk) 

— (Pok+2 — P1)G2 +++ kGk4+3 +». G2r+1] 5 (17.30) 


proving the proposition. 


Proposition 17.5.2 Let X =(qm,...,@on+2), k > 2. Then 


Berta) = Freak. Galeteaa!) = (% — G2ak+2): 
[(pi+192 +++ Gk — Pr+29dk4+3 «+ - Q2k+1) 
= (Pk+1 =~ Pk+2)42 +++ GkQk4+3 ++ qok-+1| : 


Proof. Since 


FRE*?(X) = qndorsa Pet? (1,¢2,---,Gan41, 1) 
+ pipers Fe" *? (0, qa, +++; don41;0) 


+ pigonsoFe**t (qo, ..-,dort1; 1) 
+ porpoq Fee" (1, q2,-- +, Gort1) (17.31) 


and 


A nahi Gar Mkts) = Qdonyob er? (1, G2,+++5Q2k+1) 1) 
aie Pipers ket? (0, G2,+++,Q2k+1; 0) 
a Porson Feet" (qe, eos Gaps 1) 


a piqorsok eet" (1, ¢2,--+5G2k-+1); (17.32) 
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we have 
Tp2k+2 Tp2k+2 1;2k+2) _ 
Fy, (Oe (Xx ) = (G1 — Ger42): 
T2k+1 
[Fy POT Gis. ve 00K ey) 
(2k+1 
= Fy Ht (g2,-++5G2n+1) 1)| : (17.33) 


From this, by Shanthikumar’s recursive algorithm [11], as stated in Section 
17.2, it follows that 


FRY? (x) — Bert? X84?) = (a1 — gente): 
[{ FR* (g2,.--, Gae41) 

+ pe+i92---G (1 — Ge+2---Gan+1)} 

4 { Fe* (qo, .--, Gae+1) 


+ PR+29k+3 -»-Gor+i (1 — qo... de+i)}] 


= (% — Gak+2): 

[Pe+192---k (1 — Gey3-+- Gort + Pe+29k+3 +++ G2k+1)] 
— Pk+2dk+3---Gok+1 (1 — qa... Ge + Pe+i92--- dk) 

= (% — G2n+2) [(Pe+192 --- Ok — Pk+2Uk+3 --- G2k+1) 


= (Pett = Pr+2)42 +++ GkIk+3-++> dor+1] ) (17.34) 


completing the proof. 


Lemma 17.5.1 Let X = (q1,.--,Gar+2) be a design for a linear consecutive— 
k-out-of-(2k + 2): F system, qi > dar+2. Assume quyi < dk+2- If 


Gk+3 +++ Q2k4+1 2 W2--- Ik; (17.35) 


then X*+hk+2 js a better design, whereas if 


Gk+3 +++ Q2k4+1 S 2--- Wk; (17.36) 
then X'k+2 is a better design. 


Proof. Tf qp43---dor+1 = d2---Qe, then 


P2k+29k4+3 +++ G2k+1 — P1d2--- Wk 
> (pok+2 — P1)q2 +++ Wk 
> (pok+2 — P1)d2 ++ « UkIUk+3 +++ Q2k+1; (17.37) 


and so by Proposition 17.5.1 we have Fe**?(X) > FRk*+?(X#+h#+2) proving 
that X*t+h+? is a better design. 
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If qe43---Gok+1 < q2.-. Qk, then 


PR4+192 +--+ Uk — Per+29k4+3 +++ G2k41 
> (Pet1 — Pk+2)Qk+3 +++ Q2k+1 
> (Pri — Pk+2)G2 +++ OkOk+3 «++ G2k+1, (17.38) 


and from Proposition 17.5.2 it follows that F2*t?(X) > Feet? xh2k+2). 
proving that X1?"+? is a better design and completing the proof. 


Corollary 17.5.1 Let X = (M,.-.-,@an+2) be a design for a linear consecu- 
tive-k-out-of-(2k + 2):F system, k > 2. If X is optimal, then 


() (1; Uk+1; Uk+2; Geak+2) is singular; and 
(4X01, +++ 5 Uk Uk+31 +++) G2k+2) 18 nonsingular. 


Proof. Without loss of generality we may assume q, > qox+2-. For qi < dor+2 
we apply the reasoning below to the reversed design X,. = (qox+2,---;@1)- 

Suppose (1, dk+1;%k+2,92k+2) is nonsingular. Then gz41 < qx42, and by 
Lemma 17.5.1 we have that either X*+h*+? or X1:2k+?2 must be a bet- 
ter design. Hence X is not optimal contrary to the assumption, and (i) 
follows. 

Suppose that (q1,.--,@k;Qk+3>--->Q2k+2) is singular. Then, since from 
above (1, Gk-+1;Qk+2> 2k+2) must be singular, we have that X is singular, 
contrary to the necessary condition of nonsingularity stated by O’Reilly in 
({9], Corollary 1). Hence (ii) follows and this completes the proof. 


17.6 Procedures to improve designs not satisfying the 
necessary conditions for the optimal design 


The procedures below follow directly from the results of Lemmas 17.3.1, 
17.3.2, 17.4.1, 17.5.1 respectively. Procedure 17.6.3 also applies the necessary 
conditions for the optimal design given by Malon [7] and Kuo et al. [5], as 
stated in Section 17.1. 


Procedure 17.6.1 Let X be a design for a linear consecutive-k—out—of-n:F 
or a linear consecutive-k—out—of-n:G system, with n > 2k,k > 2. In order to 
improve the design, if qi < dri, interchange components q, and qzi1. Next, 
if dn < Gn—k+1, interchange components qn and dn—k-+1- 


Procedure 17.6.2 Let X be a design for a linear consecutive-k—out—of- 
(2k +1): F system, k > 2. Rearrange the components in the positions from 
1 to k, and then the components in the positions from (2k +1) to (k +2) in 
non-decreasing order of component reliability. In order to improve the design, 
proceed as follows: 
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@ Tf desi < dk: 


1. Interchange components gx41 and qx, when q..-dk—-1 > Gk+2--- 2k; 
otherwise take the g2x41 component, put it on the left-hand side of the 
system, next to the gq; component (in position 0). 

2. In a design obtained in this way, rearrange the components in the po- 
sitions from 1 to k, and then the components in the positions from 
(2k + 1) to (k + 2) in non-decreasing order of component reliability. 

3. If required, repeat steps 1-3 to further improve this new rearranged 


design or until the condition qx+1 > ge is satisifed. 


e@ If de+1 < dk+2, reverse the order of components and apply steps 1-3 to 
the rearranged design. 


Procedure 17.6.3 Let X be a design for a linear consecutive-k—out—of- 
(2k +2): F system, with k > 2. In order to improve the design: 


e Tf qi > dar+e and geii < de+2, interchange components 


1. dei and qxia, when qp43.--dak41 2 d2--+Qk OF 
2. gq, and gox42, when gp43---Gak+1 S q2---Qk- 


e If aq < dox+2 and ge41 > de+2, interchange components 


1. de4i and gpyo, when 9%43---Qak+1 S G2--- Qk OF 
2. qi and qox+42, when gp+3---d2k+1 2 92-++ Qk: 
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Chapter 18 
Optimizing properties 


of polypropylene and elastomer 
compounds containing wood flour 


Pavel Spiridonov, Jan Budin, Stephen Clarke and Jani Matisons 


Abstract Despite the fact that wood flour has been 


known as an inexpen- 


sive filler in plastics compounds for many years, commercial wood-filled plas- 
tics are not widely used. One reason for this has been the poor mechanical 
properties of wood-filled compounds. Recent publications report advances in 


wood flour modification and compatibilization of pol 


has led to an improvement in processability and the 


ymer matrices, which 
mechanical properties 


of the blends. In most cases the compounds were obtained in Brabender- 


type mixers. In this work the authors present the resul 


ts for direct feeding of 


mixtures of wood flour and thermoplastic materials (polypropylene and SBS 
elastomer) in injection molding. The obtained blends were compared with 
Brabender-mixed compounds from the point of view of physical and mechan- 
ical properties and aesthetics. It was shown that polymer blends with rough 
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grades of wood flour (particle size >300 microns) possess a better decorative 
look and a lower density having, at the same time, poorer mechanical prop- 
erties. Usage of compatibilizers allowed the authors to optimize the tensile 
strength of these compounds. 


Key words: Tensile strength, wood-filled plastic, polypropylene elastomer, 
wood flour, optimal properties 


18.1 Introduction 


Wood flour is referred to [1] as an extender — a type of filler that is added 
to polymer compounds as a partial substitute for expensive plastic or elas- 
tomeric material. The major advantages of organic fillers, including wood 
flour, are their relatively low cost and low density [2]. Despite the fact that 
wood flour has been known as a filler since the 1950s, its commercial appli- 
cation has been rather restricted. During the past 10-15 years, new organic 
materials such as rice husks, oil palm fibers, natural bast fibers and sisal 
strands have been studied as fillers [3]—[6]. Previous investigations in the area 
of traditional wood fibers have studied particular types of wood such as euca- 
lyptus [7] or ponderosa pine [8, 9]. Most of the above organic fillers are used 
in composite materials [2]—[4],[6]—[8],[10]—[14]. 

The primary objective of this research was to find cost-effective ways to 
optimize properties of polypropylene and thermoplastic elastomers filled with 
wood flour. To eliminate a blending operation from the manufacturing pro- 
cess, direct feeding of an injection-molding machine with the polymer-filler 
mixtures was employed. 

Filling the polymer compounds with wood flour would allow not only a 
decrease in their cost, but would also reduce the environmental effect by 
utilizing wood wastes and making recyclable or bio-degradable products [2, 
15, 16]. From this point of view, the authors did not select a particular type 
of wood; instead we used a mixture of unclassified sawdust. 


18.2 Methodology 


18.2.1 Materials 


As a polymer matrix two polymers were used. Polypropylene (unfilled, 
density 0.875 g/cm?, melt index 10 cm?/10 min at 230°C) was used. This 
is one of the most common industrial plastics. Styrene—butadiene tri-block 
(SBS) elastomer (unfilled, rigid-to-soft fraction ratio 30/70, density 0.921 
g/cm?, melt index 12 cm3/10 min at 230°C) was selected because of the 
growing popularity of thermoplastic elastomers (TPE). This is due to their 
“soft touch feeling” properties and application in two-component injection- 
molding applications. 
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To compatibilize the wood flour with polymers [9]—[13], maleic anhydride 
grafted polypropylene (PP—MA) and styrene-ethylene / butylene-styrene 
elastomer (SEBS-MA) were added to the mix. The content of maleic an- 
hydride in PP-MA and SBS—-MA was 1.5% and 1.7%, respectively. 

As a filler, a mixture of local unclassified sawdust was used. The mixture 
was separated into 4 fractions. The characteristics of the fractions are given 
in Table 18.1. 


Table 18.1 Physical characteristics of the wood flour fractions 


Fraction Particle Size, js m Sieve Mesh Size Density, g/cm? 
1 600-850 20 0.92 
2 300-600 30 1.17 
3 150-300 60 1.37 
4 <150 100 1.56 


18.2.2 Sample preparation and tests 


Wood flour samples were pre-dried for 6 hours at 50-60°C in the electric 
oven before blending. The polymer—wood flour blends were obtained in two 
ways. For injection molding, the polymer and filler were mixed just before 
molding. No additional pre-compounding was used. The specimens for ten- 
sile test (Australian Standard AS 1145) were molded in a 22 (metric) tonne 
injection-molding machine. 

These blends were also pre-mixed in a Brabender mixer at 40 rpm at 180°C 
to 190°C. The polymers were first introduced in the mixer; the wood flour was 
then added when the polymers melted (a constant torque was reached). The 
total mixing time was 6-8 min depending on the composition. Each blend 
weighed 65-70 grams. While warm, the blended materials were formed into a 
2-mm sheet in the laboratory vulcanization press under 10 MPa pressure at 
180°C. The tensile specimens were punched from the sheets using a standard 
cutting die. Tensile testing of the above specimens was conducted according 
to AS 1145 on a horizontal tensile test machine. Five test samples of each 
compound were tested. 

Densities of the wood flour and polymer compounds were determined by 
a volumetric method in either water or methylated spirits. 


18.3 Results and discussions 


18.3.1 Density of compounds 


Comparison of the densities of the polymer compounds provides information 
about the quality and interactions between the polymer matrix and filler. 
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From Table 18.1 it can be seen that the density of wood flour depends on 
the particle size in each fraction. The fractions consisting of smaller particles 
have a higher density. This is because wood is a cellulose material, which has 
a porous structure [4]. Larger wood particles retain this structure with con- 
siderably less displacement of voids by the floatation medium. On the other 
hand, for smaller particles a higher percentage of voids are filled by the flota- 
tion liquid [14] resulting in a higher density. The difference in density should 
influence the density of the polymer compounds that contain different wood 
flour fractions. When the compounds were molded in the injection-molding 
machine or pressed after mixing in the Brabender mixer, their density was 
both measured and calculated. The calculations were based on the density 
of the polymer matrix and the wood flour and their ratio in the compounds. 
The results are presented in Figure 18.1. 

A difference was observed between the two processing methods and the 
calculated values. The results indicated that the densities of un-coupled PP 
were lower when molded in the injection-molding machine. The most sta- 
ble results were obtained when both PP and SBS compounds were mixed 
in the Brabender mixer and then were formed in the press. Stark et al. [8] 
have explained this observation by the compression of the compounds to 
the maximum density that the wood cell walls can sustain. This correlates 
with our results in regard to the Brabender method and the difference from 
the calculated values. However, the pressure created in an injection mold 
is comparable with the pressure developed in the press. Although an injec- 
tion machine creates a bigger plasticizing effect, the total blending time is 
shorter (1-1.5min) than for the Babender mixer (6-8 min). Therefore, in 
addition to the effect of compression, blending time is a very important 
parameter. 

The use of modifiers improved the quality of compounds without increas- 
ing the blending time. Thus maleated polypropylene allowed us to obtain 
compounds with close densities both in injection molding and the Braben- 
der mixer (see Figure 18.1). This is because of the compatibilization impact 
of maleic anhydride, which is achieved by improving the polymer matrix 
impregnation, improving fiber dispersion, enhancing the interfacial adhesion 
and other effects [9]-[13]. 


18.3.2 Comparison of compounds obtained 
in a Brabender mixer and an injection-molding 
machine 


The difference in mixing by injection molding and by the Brabender method 
influences not only the density of compounds but also their mechanical 
properties. Despite the fact that the tensile strength of the control com- 
pound (unmodified polypropylene) molded in the injection machine was 
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Fig. 18.1 Density of (a) polypropylene and (b) SBS in elastomer compounds for different 
blending methods. 


higher than that of Brabender-mixed specimens, most of the other com- 
pounds were weaker (see Figure 18.2). The primary influential parameters 
have been discussed above. In addition to pressure and blending time, tem- 
perature is also an important technological parameter. The mixing tempera- 
ture of the Brabender mixer was 185—190°C, which is below the temperature 
of wood degradation (200°C) [7]. The injection-molding temperature varied 
for polypropylene from 185°C in the center zone to 200°C in the nozzle. 
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l 


“| a) i al a 
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PP PP+40% PP+40% PP+40% PP+40% PP+10%PP- PP+10%PP- 
frac.1 frac 2 frac.3 frac 4 MA+40% MA+40% 
frac.3 frac.4 


Fig. 18.2 Comparison of tensile strength of the compounds obtained in an injection- 
molding machine and in a Brabender mixer. 


For the injection of SBS elastomer, the temperatures in the barrel were set 
10-15°C higher. The observations of the injection molding of polymer—wood 
flour mixtures showed that stability of this process depends on the filler / 
polymer ratio and the temperature. When the content of wood flour was be- 
low 40%, the process and the quality of the molded specimens were both quite 
stable. When the content of wood flour exceeded 50%, its volume exceeded 
the polymer volume and it became hard to obtain a consistent quality. In 
addition, because wood flour is hard and does not melt during the process, 
the friction between metal parts of the machine (screw, barrel) and the wood 
flour particles is very high, which also prevents the polymer matrix from 
forming a continuous phase. Therefore it was visually detected that the dis- 
tribution of wood flour in the polymer was not regular (for example, particle 
agglomerates were observed) when the wood filler content was greater than 
50% weight. The maximum content of wood flour in the following experiments 
was maintained at 40%. 

During the injection-molding experiments it was noticed that higher tem- 
peratures led to the formation of vapors in the compounds. It was accounted 
for by decomposition of the wood flour, which is known to start at around 
200°C [7]. In such cases it was difficult or even impossible to obtain good 
specimens, despite high mold pressure. Thus we were unable to mold mix- 
tures of nylon with wood flour, because nylon requires higher injection tem- 
peratures (240-260°C). Therefore the direct feeding of polymer—wood flour 
mixtures into an injection-molding machine can be done only when the 
wood flour content is less than 40% and the polymer has a melting point 
below 200°C. 
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18.3.3 Compatibilization of the polymer matrix 
and wood fiour 


The mechanical properties of the wood-filled compounds were found to de- 
pend not only on the wood flour content but also on the fraction (particle 
size). Thus a maximum reduction in tensile strength was observed in the case 
of blends containing wood flour fraction 1, which contained the largest par- 
ticle size (600-850 microns). It was found when the PP and SBS blends con- 
tained 40% of wood flour fraction 1, tensile strength losses of 72% and 60% in 
respect to the control samples occurred (see Figure 18.3). The best strength 
(15.2 MPa) of SBS compounds was achieved when they contained fraction 


MPa a) 


GNo modifier 
@ 10%PP-MA 


ONo modifier 
@ 5%+SEBS-MA 

SBS SBS+H40% SBSH0% SBS#H40% SBS+H0% 
frac.1 frac.2 frac.3 frac.4 


Fig. 18.3 Influence of wood flour fractions and the modifier on the tensile strength of 
injection-molded specimens of the (a) PP and (b) SBS compounds. 
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4 (particles <150 microns). The PP compound containing 40% of fraction 
2 had the best mechanical properties, although this result was practically 
within the statistical deviations (8%) for the strength of the compounds con- 
taining fractions 3 and 4. Similar results for PP compounds were described 
in [8]. 

The influence of the particle size on the mechanical properties of wood 
flour-polymer compounds can be explained by incompatibility between the 
polymer matrix and the filler, and the differences in their physical and me- 
chanical properties. The wood particles differ from the polymers by their 
chemical nature [4], and good adhesion between them cannot be achieved. 
Therefore a large proportion of the filler in the compounds prevents the ma- 
trix from forming a continuous phase, which leads to a reduction of the me- 
chanical properties of the blends [8, 15]. In the case of bigger particles, they 
play a role of concentration points where deformation and strain occur. Cou- 
pling agents and chemically modified polymers were introduced to improve 
interfacial adhesion between the polymers and fillers {9|—[13]. In this research, 
maleic anhydride grafted PP and SEBS were used as modifiers for PP and 
SBS blends respectively. As shown above (see Figures 18.1 and 18.2) PP-MA 
was able to improve the properties of the compounds. Figure 18.3 demon- 
strates that PP—MA led to the improvement and stabilization of mechanical 
properties [12, 13] for injection-molded compounds regardless of the wood 
fraction. The modified SBS compounds had similar properties to the con- 
trol specimens. Mixing wood flour with a polymer base and modifiers in the 
Brabender mixer (see Figure 18.2) gave even better results and allowed for 
the loss of mechanical properties of the compounds containing a high content 
of wood flour [12, 13]. 

The improved compatibility between the polymer matrix and the wood 
flour particles led to homogeneous morphology of the compounds {9, 11, 14]. 
Thus maleated polymers provided a compatibilization effect in the filled com- 
pounds. 


18.3.4 Optimization of the compositions 


The introduction of wood flour as a filler and maleated polymers as modifiers 
to the polymer base resulted in opposing technical and economic effects [12, 
13]. The relative cost of the PP compounds decreased by 40-50% as the 
content of the wood filler increased. At the same time, the tensile strength 
of these compounds droped to 40%. Increasing the PP-MA content up to a 
certain point allowed a 50% improvement in mechanical properties, however 
the cost of the compounds was 2-3 times higher. This example demonstrates 
the necessity to optimize these wood-filled compositions. 

The relative cost of the PP and SBS compounds was calculated as a ratio 
of the actual cost of the compound to the cost of the control (pure) material. 
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% SEBS-MA 


i) 


% wood flour % wood flour 005-1 


Fig. 18.4 Relative cost of the (a) PP and (b) SBS compounds depending on the content 
of wood flour and maleated polymers. 


The calculations were based on the cost of raw materials and their content 
in the compounds. Figure 18.4 shows that PP compounds are cheaper than 
the control sample (with a relative cost of less than 1) where they contain 
a considerable amount of wood flour and less than 17% PP-—MA. For SBS 
compounds, the equilibrium cost threshold was much higher. It follows from 
Figure 18.4 that it is possible to introduce 50% SEBS—MA to the compound 
containing 50% wood flour without increasing the cost of the modified com- 
pound. The difference between the optimum cost of the PP and SBS com- 
pounds is caused by the difference in the cost of raw materials. Thus the 
cost of the SBS elastomer is ~ 3 times higher than the cost of virgin PP and 
the cost of maleated SEBS is much higher than the cost of PP-MA. Under 
these conditions, the application of a cheap filler such as wood flour provides 
an economic effect, allowing the manufacturers greater flexibility with the 
composition. 

It is necessary to say that in addition to financial savings, the use of wood 
flour can provide plastics companies with the benefit of an aesthetically pleas- 
ing, natural looking, wood-filled finish. Figure 18.5 provides an indication of 
the decorative properties possible for PP compounds containing 40% of dif- 
ferent grades of wood flour. It can be noticed that rough grades (fraction 1 
and 2) provide a more natural wood look to the plastics than do fractions 3 
and 4. Therefore wood flour with particle sizes in the range from 300 to 850 
microns can be recommended for use in decorative plastics. Despite the fact 
that those fractions decrease the mechanical properties of the compounds, 
these properties may not necessarily be an essential criterion for decorative 
parts and components. It should be possible to find an optimal balance be- 
tween the properties and the cost of the compounds in the way described 
above. 
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Fig. 18.5 Photographs of the PP compounds containing 40% wood flour of different 
fractions. 


18.4 Conclusions 


The results of this research clearly demonstrate that it is possible to elim- 
inate a mixing operation and make PP and SBS products with wood flour 
content up to 40% directly by injection molding. Due to its low degradation 
temperature, wood flour cannot be blended with polymers at temperatures 
higher than 200°C. This limitation has to be considered when selecting a 
polymer. 

Our research has also shown that in addition to conventional plastics, 
thermoplastic elastomers can be filled with wood flour. The combination of 
compatible wood flour-—filled plastic and elastomer materials can be used in 
two-component injection-molding technology. 

Lower mechanical properties of the wood fiber-—filled products can be com- 
pensated for using maleic anhydride grafted polymers. The properties of the 
wood flour—polymer compounds can be optimized from the point of view of 
their mechanical properties and cost. 

Due to the different influences of wood flour fractions on the properties of 
the polymer compounds, they can be applied in different ways. Thus wood 
flour grades with particle sizes less than 300 microns can be used as extenders 
to replace expensive plastic materials, which allows for economic savings. The 


18 
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grades with particles in the range between 300 to 850 microns can be used 
in decorative plastics. 
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Chapter 19 


Constrained spanning, Steiner trees 
and the triangle inequality 


Prabhu Manyem 


Abstract We consider the approximation characteristics of constrained span- 
ning and Steiner tree problems in weighted undirected graphs where the edge 
costs and delays obey the triangle inequality. The constraint here is in the 
number of hops a message takes to reach other nodes in the network from a 
given source. A hop, for instance, can be a message transfer from one end of 
a link to the other. A weighted hop refers to the amount of delay experienced 
by a message packet in traversing the link. The main result of this chapter 
shows that no approximation algorithm for a delay-constrained spanning tree 
satisfying the triangle inequality can guarantee a worst case approximation 
ratio better than O(log n) unless NP C DTIME(n!°8!°8”), This result extends 
to the corresponding problem for Steiner trees which satisfy the triangle in- 
equality as well. 


Key words: Minimum spanning tree, maximum spanning tree, triangle in- 
equality, Steiner tree, APX, approximation algorithm, asymptotic worst case 
ratio 


19.1 Introduction 


Consider a network G = (V,E) where a certain node (the source or the 
speaker) broadcasts messages to all the other nodes (the destinations or the 
receivers) in the network. When a broadcast occurs, suppose the network 
links through which the message is relayed need to be leased for a given 
non-negative cost c;;, where i and 7 are the end nodes of the link (7, 7) € E. 
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A feasible solution to this single source broadcast problem is a set of leased 
links so that the message from the specified source reaches all destinations, 
and the message passes through each (intermediate) node at most once — 
because a receiver hearing the same piece of message more than once could 
become confused. In other words, there should be no loops or cycles in the 
feasible solution. (Any solution with loops can be modified by removing a 
few edges to break the loops, with no increase in cost. Hence we consider 
only acyclic solutions.) A piece of message (or data) is known as a packet in 
telecommunications terminology. 

The cost of a solution is the sum of the costs of the leased links, and 
an optimal solution is one which minimizes this overall cost. This broad- 
cast problem can be modeled as a MinST (minimum spanning tree) and the 
solution is a tree rooted at the pre-defined source s. 

As opposed to the broadcast problem, in the multicast problem, the mes- 
sage needs to be sent only to a select group of nodes in the network, known 
as the multicasting group. Just as the broadcast version lends itself to a 
spanning tree formulation, the multicast version lends itself to a Steiner tree 
formulation. 

Suppose we add the following constraint to the broadcast problem: that 
the number of hops taken by a message to reach any destination from a given 
source vertex s is bounded by a threshold value A. A hop can be defined as 
a message transfer from one end of a link to the other. We call this the hop- 
constrained spanning tree problem or HCSP. This problem has been shown 
to be NP-hard [12]. A variation of the HCSP is the DCSP, the diameter- 
constrained spanning tree problem, where the diameter (the number of edges 
in the longest path of the solution obtained) obeys an upper bound of A. 
The DCSP is also NP-hard [4]. 

The CSP, the delay-constrained spanning tree problem, is a generalization 
of the HCSP. Each edge in the network has two distinct parameters: (1) 
a cost cj; and (2) a delay d;;. (Here, delay refers to the amount of delay 
experienced by a message packet in traveling from one end of the link to the 
other. The total delay in a link can be broken down into transmission delay, 
switching delay and queueing delay, of which transmission delay is usually 
predominant.) The delay parameter can be considered to be a weighted hop. 
If in a CSP the delay is set to one for each edge, one obtains an HCSP. 

For a given minimization problem P, let A be an approximation algorithm 
and P; the set of instances in P. For a given instance I € Py, let the cost of the 
solution obtained by A be A;. Let the cost of the optimal solution for I be 
OPT;. Then the approximation ratio of A for instance I is R47 = Ar/OPTy. 
Over all instances I € Pr, the absolute performance ratio is defined as [4]: 


Ra = sup(r>1: Raz <r for all J € Py). (19.1) 


The lower the value of Ry, the better the heuristic A. A constant value of 
R, is superior to a value that depends on the size of instances, for example, 
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Ra € O(n) or Rg € O(logn). Lund and Yannakakis [7] show that the SET 
COVER problem cannot be in the class APX, which is the class of prob- 
lems for which it is possible to construct a polynomial time heuristic A that 
guarantees a constant value on Ry. Feige [3] showed that unless NP C 
DTIME(n!*8!°8"), the SET COVER problem cannot be approximated to 
within O(log n). Manyem and Stallmann [9] have shown that an HCSP, and 
hence a CSP too, cannot be in the complexity class APX. Results from [2] 
indicate that a DCSP is unlikely to be in APX. Heuristics for the Steiner 
tree version of the problem with general costs and weighted hops appear in 
Manyem [8]. 

Marathe, Ravi et al. [10] consider networks with both cost and delay pa- 
rameters on the edges. They provide an approximation algorithm that guar- 
antees a diameter within O(log |V]|) of the given threshold A and a total cost 
within O(log |V]|) of the optimum. A vast compendium of results on approx- 
imability is provided in Ausiello et al. [1]. 

Figure 19.1 provides a road map of some of the optimization problems that 
arise in telecommunication networks. Here S is the set of terminal nodes for 
Steiner tree problems. In multicasting terminology, S is the set of conference 
nodes. Problem 1, the Constrained Steiner Tree (CST), is the hardest in the 


|S| =2 CST all edge weights = 1 


O\ eH 
delays 


CST with 
unit weight 
edges 


Constr. 
Shortest 
Path 


CSP with 


unit weight HCST with delay constr. 


unit weight path with 
min. # of edges 


poly. time 


height constr. 
path with 
min. # of edges 


HCSP with 
edge weights 
lor 2 


NP-C ; NP-Complete HCSP with 
poly. time : solvable in ; unit weight 
polynomial time edges 


poly. time 


Fig. 19.1 A Constrained Steiner Tree and some of its special cases. 
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figure — all other problems are special cases of CSTs. Given an instance of a 
CST, if we set the multicast group to be all nodes in the network, we obtain 
the Constrained Spanning Tree (Problem 3). Given an instance of a CST, if we 
set the multicast group to be just two nodes which need to communicate with 
each other, we obtain Problem 2, the Constrained Shortest Path. Problem 7 
is a Hop-Constrained Spanning Tree, and Problem 4 is a Hop-Constrained 
Steiner Tree. 

All problems above the dotted line in Figure 19.1 are NP-complete, and 
the ones below can be solved in polynomial time. Positive results from one 
problem to another flow in the direction of the arrows, and negative results 
flow in the direction against that shown by the arrows. For example, if we can 
develop a heuristic for Problem 1 that guarantees an upper bound B on the 
approximation ratio over all instances, this will also hold true for all problems 
in the figure. On the other hand, if we can show (a negative result) that unless 
NP c DTIME(n'°2!°8”), there can be no heuristic that guarantees an upper 
bound of B for Problem 7, then this will also be true for Problems 1, 3 and 
4. See Table 19.1 for further details. 


Table 19.1 Constrained Steiner Tree and special cases: References 


Problem Results and 

Number in Figure 19.1 References 

1, 3-5 8] and [9] 

2 5] 

4 2}, [8] and [9] 

6 Shortest path problem 

7 2], [12] and Problem ND4 in [4] 
8 8] and [9] 

9, 11 Problem ND30 in [4] 

10 12] and Problem ND4 in [4] 
12 Breadth First Search 


The proof in [12] that Problem 10 in Figure 19.1 is NP-hard renders prob- 
lems 1, 3, 4 and 7 NP-hard as well. Similarly, the proofs in [9] show that 
unless NP C DTIME(n!°8 8”), Problems 7 and 8 cannot be approximated 
to better than O(logn). Hence this non-approximability result carries over 
to Problems 1, 3, 4 and 5. 

In this chapter, we consider special cases of CSPs and HCSPs where the 
edge costs and delays obey the triangle inequality (we call these problems 
CSP;s and HCSP7s respectively). First, in Section 19.2, we show that the 
cost of spanning tree solutions for a CSP; and an HCSP;, in a given network 
G = (V,E) is at most |V| — 1 times the cost of any other spanning tree 
solution for G. This implies that any solution is within a |V| — 1 factor of 
the optimal solution. 

Next, in Section 19.3, we prove that the lower bound for any approxima- 
tion algorithm for a CSP; is @(logn). Unless NP C DTIME(n!°8!°8"), no 
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approximation algorithm can guarantee an Ry better than this. We show 
this by an E-Reduction (explained in Section 19.3.1) from the SET COVER 
problem. 


19.2 Upper bounds for approximation 


We first show that for a given network G = (V,£) with non-negative costs 
cj on undirected edges (i,j) € E, the value of a spanning tree is at most 
|V| — 1 times that of any other. We shall assume that the underlying graph 
is complete without loss of generality (if the network is not complete, we can 
add edges to the network with costs that obey the triangle inequality). We 
start with a well-known result for such graphs. 


Remark 1. For any two vertices i and j in G, where the edge costs of G obey 
the triangle inequality, the edge (7,7) is also a least expensive path in G 
between these two vertices. 


19.2.1 The most expensive edge is at most a minimum 
spanning tree 


We show here that Dmaz, the cost of the most expensive edge in EF, is at most 
equal to Ty,in, the cost of a MinSTg (minimum spanning tree for G). Let 
the endpoints of the most expensive edge be s and t. Let L, be the cost of 
the s — t path using the edges in MinSTg. From Remark 1, it follows that 
Cst = Lmax < Ly. Since Ly < Tin, we conclude that Dimas < Tmin- 


Remark 2. In a network G where the edge costs obey the triangle inequality, 
the cost of the most expensive edge in G is at most the cost of a minimum 
spanning tree of G. 


19.2.2 MaxST is at most (n — 1)MinST 


Let Tinaz be the value of a MaxSTg (maximum spanning tree of G) where 
|\V| = n. Since Linax is the cost of the most expensive edge in G, it follows 
that Tae < (1 -1)Lmax- From Remark 2, Dmax < Tmin. Thus Tinax < 


(n —1)Tinin, which is what we set out to show in this section. 


Remark 3. For a given undirected network G = (V,E) which satisfies the 
triangle inequality, the ratio of the costs of MaxSTG to those of MinSTg has 
an upper bound of |V| — 1. Hence the ratio of the costs of any two spanning 
trees in G has this upper bound. 
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Remark 4. For undirected networks where the edge costs obey the triangle 
inequality, the performance ratio R,4 for any approximation algorithm A has 
an upper bound of |V| for any version of the spanning tree problem that has 
the objective of minimizing the sum of the edge costs in the feasible solution. 


In particular, the above remark is true for CSP;s and HCSPys. 


19.3 Lower bound for a CSP approximation 


An upper bound on the performance ratio R4 for any approximation al- 
gorithm for a CSP; and an HCSP, is provided in Remark 4. Let us now 
turn to proving a lower bound for a CSP,;. We show in this section that un- 
less NP C DTIME(n!°8!°8”), there can be no heuristic that can guarantee 
a performance ratio better than O(logn) for a CSP;. We show this by an 
E-Reduction from a SET COVER to a CSP;. Since this lower bound holds 
for a SET COVER [8, 7], it does so for a CSP; too, via E-Reduction. Recall 
that a CSP, is the version of a CSP where the edge costs obey the triangle 
inequality. 


19.3.1 E-Reductions: Definition 


If problem A E-reduces to problem B, then B is as hard to approximate as 
A. The formal definition of E-Reduction is as follows. 


Definition 1 (E-reduction [6]). A problem A E-reduces to a problem B, 
or A <x B, if there exist polynomial time functions f and g and a constant 
GB such that 

(1) f maps an instance J of A to an instance J of B; and 

(2) g maps solutions T of J to solutions S of I such that 


e(I,S) < Bel(J,T), (19.2) 
where the error term e(I,S) is defined below. 


Definition 2 (Error [6]). For minimization problems, a solution S to an 
instance I has error e(J, S$) if 


V(I,S) 
opt(I) 


where V(I,S) is the value of a solution S to instance I and opt(I) is the 
value of an optimal solution to I. 


=T4e's), (19.3) 
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Another type of reduction used in approximability theory is the L- 
Reduction introduced in [11]. 


19.3.2 SET COVER 


A SET COVER instance is defined by a ground set X = {x;|1 <i <p} and 
a collection Y = {y;|1 <j < q}, each y,; being a subset of X. The goal is to 
find a cover Y’ of X such that (a) Y’ CY, (b) |", the cardinality of Y’, is 
minimal, and (c) Y’ satisfies 


U w=. 


yZEY! 


19.3.3 Reduction from SET COVER 


For a CSP;, a spanning tree needs to be determined such that (1) its cost is 
minimal and (2) the sum of the edge delays in the path from a specified vertex 
s € V (the source) to every other vertex in V is at most A, a non-negative 
integer. 

We create an instance of a CSP; as follows (see Figure 19.2). For each 
x, € X and y; € Y in SET COVER, create a vertex. Create an additional 
vertex s. Thus |V| = |X| 4+ |Y|+1. Since |V| = n, |X| = p and |Y| = q, we 


Fig. 19.2 A CSP, instance reduced from SET COVER (not all edges shown). 
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have n = p+q+1. The edges in F in the instance G = (V, E) of the CSP; are 
assigned as in Table 19.2. The costs (delays) assigned to the edges are given 
in Column 3 (Column 4) of the table respectively. The graph G is complete. 
However, in the interests of clarity, not all edges are shown in Figure 19.2. 
Only edge costs are shown in the figure, not edge delays. 


Table 19.2 E-Reduction of a SET COVER to a CSP;: Costs and delays of edges in G 


Edge Set Definition Cost Delay 
Ey {(s,¥j)] 1 <j Sat n 1 
Ey {(s,%i)| 1 <i <p} n+1 2 
Es {(yj,vi)| t% Eyj, LSti<p,l<j<q} 1 1 
E4 {(yj,%i)| te yj, LS isp, 1<j<aqh 1 2 
Es {yoy ll Si<j sat 1 1 
Ee {(zi,vj)| LS i<j <p} 1 1 


Let A = the delay constraint at all vertices = 2. Note that both the edge 
costs and the edge delays in G obey the triangle inequality. (In most cases, 
both the cost and the transmission delay of an edge directly relate to its 
length. Further, queueing and switching delays are usually minor. Hence the 
cost c;; and delay d;; of an edge are closely related (they could be directly 
proportional, for example). However, there may be instances where the in- 
crease in edge delay is significantly faster than that of edge cost. For instance, 
due to a high degree of congestion in the network, queueing and switching 
delays could be far higher than normal. From Table 19.2, the total set of 
edges of the graph G is given by E = US_, E;. 


19.3.4 Feasible Solutions 


Recall that the delay constraint is equal to 2. It is possible for the (s,2;) 
edges to be utilized in a feasible solution — if they are, they can be deleted 
from the solution with no increase in cost. Note that there can be no paths 
of the form s — x; — y; nor of the form s — y; — 2;, where (x;, y;) € E4; that 
is, when x; ¢ y; in the SET COVER problem. This is due to the high delay 
(2 units) of such edges. In either of the paths just mentioned, the leaf vertex 
would experience a delay of 3. 

Suppose for x; = Xo, the edge (s,29) is in the feasible solution So re- 
turned by a heuristic. This edge can be replaced as follows. For any y; € Y, 
the edge (xo, y;) is not in the feasible solution Sp, otherwise the delay at 
such a y; would be 3, violating the delay constraint. There are two possible 
cases here: 


e Suppose there exists a yo such that zp € yo in SET COVER, and edge 
(s, yo) € So. Then replace (s, 29) with (2;, yo) to obtain a new solution $1. 
Observe that cost[So] — cost[.S;] = n units (the cost decreases). 
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e Alternatively, such a yo may not exist. In any case, there is at least one 
edge (to, y;) in G for some y; € Y, otherwise no feasible solution is ever 
possible for the CSP;. This is due to the fact that xo € y; for at least one 
yj € Y in SET COVER. We name this y;, y;. As per our assumption 
for this case, the edge (s,y,) is not in Sg. This implies that y, is a leaf 
vertex in So, and has another y; (say y2) as its parent. To obtain a new 
solution 51, we can delete the edges (yi, y2) and (s, 29), and replace them 
with (s,y1) and (xo, y1). In 51, the vertex x9 remains a leaf, but y, is 
no longer a leaf. The delay constraints are still obeyed at all vertices. 
We have 


cost|So] — cost[S1] = cost[(s, xo)] + cost[(y1, y2)] — cost[(s, y1)] 
—cost|(%o, y1)] =(n+1)+1-n-1=1 


(that is, the cost decreases). 


To obtain S; from So, at most |X| edges of the form (s,2;) need to be 
replaced, and each replacement takes a constant amount of time. Thus Sj 
can be obtained from So in time O(|X]|) = O(n), a time polynomial in the 
number of elements in the ground set of SET COVER. Once all edges of the 
form (s,2;) have been eliminated from So, the resulting feasible solution S, 
will be as described below. 


19.3.4.1 Structure of S, 


The parents of the x’s in an FS have to be y’s — such y’s should in turn have s 
as their parent. Also due to the delay constraint, a path such as s— yj —Yyr—%;4 
can also be ruled out for any 1<i<pandl<j<r<q. 

Not all y’s need to have s as their parent — some of the y’s can have another 
y (say Ya, for example) as their parent, as long as y,’s parent is s (recall the 
delay constraint of 2). Suppose we call a y such as y, a covering y and the 
rest non-covering y’s. The covering y’s together form a cover to the a’s — 
these y’s may or may not be leaves in an FS. The non-covering y’s will be 
leaves. In other words, a y; is 


e in the cover if s is y;’s parent, and 

e not in the cover otherwise. In such a case, the parent of y; would be a 
covering y. The delay constraint forbids a non-covering y; to be the parent 
of an x; in an FS. 


It is sufficient for all the non-covering y’s to have a common parent. Let the 
cover size (the number of covering y’s) be k. If in Figure 19.2, we move the 
cover to the left (the y’s can be renumbered in such a way that y; through 
yx cover all elements in X), a feasible solution as described above will look 
like the one in Figure 19.3. 
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Fig. 19.3 Feasible solution for our instance of CSP; (not all edges shown). 


Note that it is cheaper for the non-covering y’s to have one of the covering 
y’s as their parent, rather than s — cheaper by a factor of n. The spanning 
tree (the feasible solution in Figure 19.3) includes the following: 


e (s,y;) edges: k in number, each with a cost of n, 

e edges of the form (y;, y;), where (s, y;) is part of the FS, and (s,y;) is not 
(in other words, y; is in the cover and y; is not): g — k such edges, each 
with unit cost, 

e (x;,y;) edges: p in number, each with unit cost. 


Thus the cost of the spanning tree of Figure 19.3 equals 
k(n) + (q—k)(1) + (p)Q) = kn t+n-—k-1, 


since n = p+q+1. The cost then, can be described as a function C(k), 
where 


C(k) = kn+n—k-1, 1<k<q. (19.4) 


19.3.4.2 Correspondence Between Feasible Solutions 


Note that there is a one-to-one correspondence between feasible solutions in 
SET COVER and S; for our instance of the CSP;. A set of covering y’s in 
our CSP; instance can also be used as a cover in SET COVER. In the other 
direction, a cover in SET COVER can be transformed to a set of covering y’s 
in our CSP, instance; these will have s as their parent in a FS. The other 
(non-covering) y’s will have one of the covering y’s as their parent, and the 
x’s will have the covering y’s as their parent(s). 


19 Constrained spanning, Steiner trees and the triangle inequality 365 


For this reduction, J is a SET COVER instance, S is a solution to I, J is 
our instance of CSP; corresponding to J, and T is a solution to J. From the 
above argument, we have the following lemma. 


Lemma 1. A SET COVER instance I has a solution with cardinality k (1 < 
k <q) iff the corresponding CSP; instance J has a solution with a total cost 
C(k). 

For any approximation algorithm for CSP; to obtain the least possible 
cover size, the costs C(k) should monotonically increase from C(1) through 
C(q), and this is indeed the case with Equation (19.4). Further note that 
the reduction from SET COVER can be carried out in polynomial time. To 
complete the proof that this is an E-reduction, we only need to show that 
the error condition (19.2) is satisfied for some constant (3. 


19.3.5 Proof of E-Reduction 


Let k (< q) be the value of any feasible solution to SET COVER, and | be 
that of the optimal solution. Obviously, | < k < q. Therefore 


k k-1 
and from (19.4), 
C(k) kn+n—k-1 (n —1)(k - 1) 
ey) in¢n—I—1 nlt+n—I—1 


We need to find a constant 2 such that Be(J,T) > e(I,S), or 


(n—1)(k -J)) as k-l 
nl+n—-l—-1 ~— l 


Bp 


9 


or 


B>14r'. (19.5) 


The second term in (19.5), J~', is bound by 0 < I-! < 1. Thus it is 
sufficient to finda @ > 2. Let us set G = 2. This completes the proof of 
E-Reduction. Thus we have shown that the following theorem holds. 
Theorem 1. SET COVER E-reduces to a CSP. 


Corollary 1. A CSP; does not belong to APX. Further, CSP; cannot be 
approzimated to within O(logn) unless NP C DTIME(n'°8'°8"), where n is 
the number of nodes in the network. 
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19.4 Conclusions 


From Remark 4, it follows that certain versions of the minimum spanning tree 
problem that are of interest in data networking (which need not necessarily 
be single-source) have an approximation upper bound of |V|, the number of 
nodes in the network, when the edge costs obey the triangle inequality. In 
particular, 


e the hop-constrained version HCSP7, 
e the delay-constrained version CSP;, and 
e the diameter-constrained versions (weighted as well as unweighted) 


have an upper bound of |V| on the performance ratio of any approximation 
algorithm. 

The result from Section 19.3 extends to the case of constrained Steiner 
trees which satisfy the triangle inequality, since CSP; is a special case of such 
problems for Steiner trees. Specifically, we can conclude that the following 
theorem holds. 


Theorem 2. The following single-source problems with edge costs obeying the 
triangle inequality cannot have an approximation heuristic A that can guaran- 
tee a performance ratio Ra better than O(log n) unless NP C DTIME(n'°8 8" ), 
and hence these problems cannot be in APX: 


e the delay-constrained spanning tree problem CSP;, and 
e the delay-constrained Steiner tree CST. 


The CST, is the triangle-inequality version of Problem 1 in Figure 19.1. 
Both the edge costs and delays in the delay-constrained problem versions 
mentioned in this section need to obey the triangle inequality. 
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Chapter 20 
Parallel line search 


T. C. Peachey, D. Abramson and A. Lewis 


Abstract We consider the well-known line search algorithm that iteratively 
refines the search interval by subdivision and bracketing the optimum. In our 
applications, evaluations of the objective function typically require minutes 
or hours, so it becomes attractive to use more than the standard three steps 
in the subdivision, performing the evaluations in parallel. A statistical model 
for this scenario is presented giving the total execution time T in terms of 
the number of steps & and the probability distribution for the individual 
evaluation times. Both the model and extensive simulations show that the 
expected value of T does not fall monotonically with k, in fact more steps may 
significantly increase the execution time. We propose heuristics for speeding 
convergence by continuing to the next iteration before all evaluations are 
complete. Simulations are used to estimate the speedup achieved. 


Key words: Line search, parallel computation 


20.1 Line searches 


A line search involves finding the minimal value of a real function f of a single 
real variable x. We attempt to locate the minimizing argument to within a 
“tolerance.” Formally, given an interval [a,b] € R, a function g : [a,b] ~ R 
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and a tolerance d, we require p, g such that x* € [p,q] C [a,b] where g is 
minimal at «* and q — p < d. We assume that the derivative, if it exists, is 
unknown. 

Apart from their use in one-dimensional optimization, line searches are 
used in optimization on domains of higher dimension. For example, the quasi- 
Newton search methods use repeated cycles of determining the search direc- 
tion and then performing a line search in that direction. 

The line search algorithm is one of repeated subdivision of the interval 
and restriction to a subinterval. It can be summarized as follows: 


Enter initial interval [a,b] and tolerance d. 

Set p=a,q=b. 

Subdivide [p, g] with points p = xq < a1 <a <...< a, = q, where k > 3. 
Compute g; = g(a;) fori =0,1,2,...,k. 

Select 2m : Gm = min; g;, the point where g is least. 

If m= 0 replace p by xp and q by 21, 

else if m = k replace p by xp_ 1 and q by &z, 

else replace p by @»—- 1 and q by @m+41. 

7. Ifqg—p<d then return (p,q), 

else go to Step 3. 


Se Ole IN 


Clearly the algorithm will terminate if sup(7;—2;_1)/(q—p) < 1/2, where the 
supremum is taken over both steps in the line search and iterations of that 
search. The process yields an interval [p,q] which is guaranteed to contain 
the minimum if g is unimodal on [a, b]. Usually k is 3 as this is more efficient 
in terms of the number of function evaluations. It has long been known that 
the “Fibonacci search” [5] will minimize the number of function evaluations 
in the worst case. If g is approximately quadratic near the minimum then 
alternative methods such as Powell’s [6] can be expected to be more efficient. 
We are concerned with applications where each function evaluation may 
take at least several minutes on a fast processor. For example, g may represent 
aerodynamic drag on an object where x is some shape parameter, so a flow 
simulation would be required for each function evaluation. Further, we assume 
that batches of evaluations may be performed concurrently, on a cluster of 
computers or using the resources of the global grid. Clearly in such cases the 
speed of convergence may be improved by using more than three steps in each 
subdivision. These “parallel line searches” are the subject of this chapter. 


20.2 Nimrod/O 


Nimrod/O [1, 2] is an optimization package designed for the scenario de- 
scribed above, that is, long evaluation times employing multiple processors. 
The user prepares a “schedule file” such as the one in Figure 20.1. This speci- 
fies the problem parameters, any constraints linking them, how the objective 
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Fig. 20.1 A sample con- parameter alpha float range from 1 to 15 
figuration file. parameter tcmax float range from 0.5 to 1.5 


parameter cmax float range from 0.5 to 1.0 
constraint alpha >= tcmax + 2.0*cmax 


task main 
copy * node:. 
node:substitute skeleton foil.inp 
node:execute run.all 
copy node:obj.dat output .$jobname 
endtask 


method simplex 
starts 5 
starting points random 
tolerance 0.01 
endstarts 
endmethod 
method bfgs 
starts 5 
starting points random 
tolerance 0.01 
line steps 8 
endstarts 
endmethod 


function is to be evaluated and the optimization algorithm to be used. This 
example uses two algorithms, the downhill simplex and the method of Broy- 
den, Fletcher, Goldfarb and Shanno (BFGS), each run 5 times with different 
starting points. 

The architecture of Nimrod/O is shown in Figure 20.2. Rectangles rep- 
resent separate processes. The Controller reads the schedule and launches a 
process for each optimization. When an optimization requires a set of objec- 
tive evaluations, it first checks the Cache to determine which jobs have already 
been run. Jobs that are new are sent to the dispatcher which is either the 
“Nimrod” system [1] or its commercial version “enFuzion.” The dispatcher 
may run evaluations on the local machine, or on a cluster of machines or 
perhaps on the world grid. 

Note that this architecture allows separate optimizations to be run in par- 
allel. Within each optimization we have endeavored to speed the algorithms 
by employing parallel evaluations where possible. For example our implemen- 
tation of the BFGS algorithm uses a parallel line search and also concurrent 
evaluations in the determination of the search direction; we call this imple- 
mentation “Parallel-BFGS.” 
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Optimizations 


Cluster 


V 


Fig. 20.2 Architecture of Nimrod/O. 


Currently Nimrod/O is being applied in three areas: 


e Design of an aerofoil. Here a two-dimensional aerofoil is specified in terms 
of three shape parameters. A FLUENT simulation is used to compute 
the flowfield around the aerofoil and compute lift and drag. The design 
problem is to determine the shape parameters that maximize the ratio of 
lift to drag. 

e Optimal fatigue life. Finite element models are used to predict the life 
of mechanical components with pre-existing cracks under a cyclical stress 
regime. We require the component shape that maximizes this life 

e Image compression. We consider a compression method based on the mam- 
malian vision system which involves up to 96 parameters. The parameters 
are to be selected to minimize the compression ratio. 


It was noticed during the aerofoil study that the execution time for eval- 
uations was bimodal. Most jobs took about 30 minutes but occasional ones 
required between 3 and 4 hours. Consequently some of the line searches had 
completed all but one of the evaluations in less than 40 minutes and then 
required about 3 more hours to finish the last one. (There was no obvious 
pattern to the values of the domain that gave rise to long execution times.) 
This raised two issues: 


A: A smaller number of steps in the line search may achieve faster 
convergence as fewer jobs are less likely to include an exceptionally 
long one. 

B: Faster completion may be provided by a mechanism for aborting longer 
jobs and proceeding to a subinterval identified by the completed jobs. 


We consider Hypothesis A in Section 20.3 and B in Section 20.4. 
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20.3 Execution time 


20.3.1 A model for execution time 


This section presents a model for the execution time for a line search, in 
terms of the number of steps used. 

Suppose that each iteration of the line search uses k > 3 steps; we assume 
that the points are equally spaced. Let 1 be the length of the original search. 
Each iteration reduces the length of the current domain to a proportion 2/k 
of the previous (or 1/k if the minimum happens to fall at an end point). Let 
r iterations be the most required to reduce the length to the tolerance d so r 
is the least integer such that 1(2/k)" < d. Hence 


cea ( foalt/a) 
oF (mem) | on 


where ceil(x) signifies the least integer that is not less than x. We write T; 
for the evaluation time for the ith subdivision point and assume that all the 
T; have the same probability density function f(t) and distribution function 
F(t). We write s for the number of evaluations required in an iteration. Note 
that, after the first iteration, subsequent ones will not require evaluations at 
the end points of the subinterval. Further, if & is even and the best point 
in the previous interval was internal, then the objective at the midpoint of 
the current interval will have been found in the previous iteration. So we 
approximate s by k—2 if k is even and k —1 if k is odd. As these evaluations 
are performed in parallel; the evaluation time for one iteration is B = max; Tj. 
For the scenario discussed above these times are much larger than the times 
required for selection of the subdivision points and comparison of the values 
there. So we assume that the time for each iteration is just B. We assume 
also that the 7; are statistically independent. Under this condition, see for 
example [3], the distribution function for B is F(t)*. Thus the mean time for 
completion of a batch is approximately 


M= [te [FO] a. (20.2) 


Hence the expected time for the complete optimization is 


E= Mr =ceil (| he i [F(t)*] dt. (20.3) 


20.3.2 Evaluation time a Bernoulli variate 


As a model of the bimodal distribution encountered with the wing flow ex- 
periments, consider the case where the execution time for a single job has a 
discrete distribution 
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f(t) = ad(t — x) + (1 —-a)d(t— y), (20.4) 


where 6 is the Dirac delta and a, x and y are constants with 0 < a < 1 and 
x <y. Then (20.2) becomes 


M = xa* + y(1- a’) (20.5) 
and (20.3) becomes 
E = [xa* + y(1 — a*)| ceil (se | : (20.6) 


Graphs of these functions are shown in Figure 20.3. Figure 20.3(a) shows 
how r decreases in a piecewise manner. Figure 20.3(b) gives M for the case 
x =1, y = 8, 1/d = 1000 and a = 0.9. Figure 20.3(c) shows E, the product 
of r and M. Since M increases and r is piecewise constant, FE increases while 
r is constant. 


15} 4 


10+ 4 
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(a) Number of iterations, r 
10 1 1 1 
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(b) Expected time per iteration, M 
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Fig. 20.3 Performance 0 i a0 a ag a0 a ao 
with Bernoulli job times. (c) Expected time for line search, E 
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20.3.3 Simulations of evaluation time 


For other distributions of job times, computation of (20.2) becomes difficult 
so we have performed simulations instead. The line search was performed on 
the function g(a) = e~* sin(20x) on the domain [0,1], shown in Figure 20.4. 
This function has four local minima with a global minimum at x % 0.2331. 
The tolerance used was 0.001. Job times were generated randomly from (a) an 
exponential distribution with parameter 2 and (b) a rectangular distribution 
on [0, 1]. 


Fig. 20.4 Test function 
g(x). 


Figure 20.5(a) shows the mean total execution time, averaged over 10,000 
runs, plotted against k for the exponential distribution. Each point is shown 
with error bars enclosing three standard errors. Figure 20.5(b) does the same 
for the rectangular distribution. Similar results were obtained for a wide 
variety of tolerance values. 

For some simulations the line search failed to locate the global minimum, 
converging on a local minimum instead. Figure 20.5(c) shows the “effective- 
ness,” the proportion of runs that achieved the global minimum. Here the 
algorithm is deterministic so effectiveness for a given k is either 0 or 1. In 
the next section the search will depend on the order of arrival of jobs and 
effectiveness will be fractional. 


20.3.4 Conclusions 


The preceding results show that increasing the number of steps in a parallel 
line search may be counter-productive; increases in k may produce consider- 
able increases in &. For this to occur there must of course be variability in 
the job times. Note that Figure 20.5(b) shows much less increase than does 
Figure 20.5(a), although the mean and variance of the job times are simi- 
lar. The significant factor is that the probability of job times is considerably 
larger than the mean. 
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Fig. 20.5 Results of simu- 14 ; : : : : 
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(c) Effectiveness versus k 


A typical user of the line search algorithm will not have information on the 
distribution of job times. However the total time E(k) has local minima at 
points where r(k) decreases and these values can be predicted from knowledge 
of just the initial interval length / and the tolerance d. Consideration of (20.1) 


shows that r falls to a value p at k = ceil { 2 ft . This can be used to 


compute the number of steps & for a desired number of iterations p. 

Our analysis has assumed that evaluation times are independent. If these 
are dependent, one may expect positive autocorrelation on the parameter 
space. This would lead to reduced variation in the later iterations of the line 
search which in turn would reduce growth in E between jumps. When the 
objective function is continuous but not unimodal we expect a priori that 
increasing the value of /& makes attaining the global minimum more likely. 
Figure 20.5(c) supports this. 
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20.4 Accelerating convergence by incomplete iterations 


20.4.1 Strategies for aborting jobs 


We consider strategies for proceeding to the next iteration of a line search 
before the evaluations for all points in the current iteration are complete. 
Three heuristics are proposed. 

Figure 20.6(a) illustrates a situation where 5 of the 7 evaluations of a 
function g(x) are complete. The minimum so far occurs at z = 3 and the 
neighbors of that point have been evaluated. If g is unimodal then clearly 
the minimum is in the range [2,4]. Thus the remaining evaluations may be 
aborted and the line search can proceed to the next stage. This leads to the 
following algorithm for one iteration of a line search: 


g(x) 
OrFRFNW fu DD 
e 


(a) Strategy 1 


g(x) 
OrRFNW FUDD 
T 
e 


Fig. 20.6 Incomplete eval- 0 1 2 3 4 5 6 
uation points. (b) Strategy 2 


Strategy 1. 
Suppose an iteration involves determination of objective values 
90;91;---,9k- At any time suppose that S represents the set of the g; that 
have been completed by parallel evaluation. When each new value gj arrives: 
add it to the set S 
determine gm, the least value in S 
if0<m<k and gm—-1,9m4i1 € 8 return [@m—1,€m+41] 
else ifm =0 and g, € S return [xo, 21] 
else ifm =k and gp_-1 € S return [xp_-1, Lx] 
continue 
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This approach can be extended to returning a greater interval than 
that provided by the immediate neighbors of the minimum point. In Fig- 
ure 20.6(b), if g is unimodal then the minimum is in the interval [2, 5]; it may 
be worthwhile terminating the iteration with this interval. Many variants of 
this idea are possible. We investigate only the following. 


Strategy 2. 
Construct S as in Strategy 1. When each new value gj arrives: 
add it to the set S 
determine gm, the least value in S 
iffo<m<k 
if Gm—1,9m41 © S then return [am—1,%m+1] 
else if Gm—1,9m+2 € S return [&m—1,€m+2] 
else if Gm—2,9m41 € 8 return [&m—2,€m+1] 
else if m=0 
if g. € S then return (xo, x4] 
else if go € S' return [x0, £2] 


else if m=k 


if gr-1 € S then return [xp_-1, LE] 
else if gx-2 € S' return [xp—-2, Ux] 


continue 


If sufficient processors are available it may be advantageous to both con- 
tinue an iteration and to explore a subinterval identified as likely to contain 
the minimum. This leads to our third heuristic. 


Strategy 3. 

Use Strategy 2 to identify the subinterval and then start an iteration based 
on that interval, but also continue with the original iteration to completion. 
If later the original iteration finds a minimum better than any so far in the 
new iteration then the algorithm will “backtrack,” abort the new iteration and 
start another iteration based on this improved minimum. 


This is essentially a form of speculative computing, see [4]. Recursion allows 
a simple implementation. 

We also considered the effect of applying Strategies 1-3 only after the 
penultimate job has arrived, that is, when k of the k+1 evaluations have been 
completed. These heuristics will be denoted by 1p, 2p and 3p, respectively. 
A full search, completing each iteration before proceeding to the next, is 
denoted by F. 


20.4.2 Experimental results 


The strategies were implemented for line searches on the test function of 
Figure 20.4 with tolerance 0.001. For each & from 3 to 70, the search process 
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Fig. 20.7 Strategy 1 with 
exponential distribution of 
job times. 
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was simulated 10,000 times with execution times selected randomly from 
some probability distribution. 

Strategies 1 and 1p were applied using exponential evaluation times with 
a mean = 2. Figure 20.7(a) shows the mean execution times and Figure 
20.7(b) the effectiveness. In each case the results for these strategies are 
compared with those for a full search. Figure 20.8 shows times for the same 
range of strategies but with evaluation times from a rectangular distribution 
over the interval [0, 1]. 

These experiments were repeated with the other strategies. Figure 20.9 
shows results for the same method as Figure 20.7 but with Strategies 1 and 
lp replaced by 2 and 2p. Similarly Figure 20.10 shows results for Strategies 
3 and 3p. 
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Fig. 20.9 Results for 
Strategy 2. 


Fig. 20.10 Results for 
Strategy 3. 
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20.4.3 Conclusions 


For job times with an exponential distribution, Strategy 1 shows a speedup 
of between Strategies 2 and 3 for k > 12, less for k < 12. This increased 
speed is at the expense of a deterioration in the effectiveness of the search. 
Strategy lp is intermediate in performance between F and Strategy 1 for 
both execution times and effectiveness. The experiments with a rectangular 
distribution of job times showed less speedup, as there was less increase in 
M with k. Strategy 2 gave more speedup than that of Strategy 1 but with a 
further loss of effectiveness. Strategy 3 gave a speedup almost identical to that 
of Strategy 2 but with improved effectiveness. Hence this strategy is to be 
preferred when occasional long jobs are delaying execution. This advantage 
is at the expense of the need for extra processors when two iterations are 
running concurrently. 
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Chapter 21 

Alternative Mathematical 
Programming Models: A Case 

for a Coal Blending Decision Process 


Ruhul A. Sarker 


Abstract Real-world problems are complex. It is not always feasible to in- 
clude all aspects of reality in the model of a problem. In most cases, we 
deal with a simplified version of the problem that contains only some as- 
pects of reality. Thus a problem can be modeled in a number of different 
ways depending on the portion of reality to be included or excluded. In this 
chapter, we address the alternative mathematical programming formulation 
approaches for a real-world coal-blending problem under different scenarios. 
The complexity of formulation and solution approaches, quality of solutions, 
and solution implementation difficulties for these models are compared and 
analyzed. Choice of the most appropriate model is suggested. 


Key words: Coal blending, alternative modeling, mathematical program- 
ming, linear programming, nonlinear programming 


21.1 Introduction 


In this chapter, we consider a real-world coal-blending problem. Coals are 
extracted and upgraded for the customers. The raw coals are known as run of 
mine (ROM) in the coal mining industry. Each coal has its own typical quality 
specifications. Coal quality is measured in terms of percent of ash, sulfur 
and moisture, and BTU content per pound, as well as having metallurgical 
properties. BTU content per pound expresses the heating value of coal. Higher 
ash content lowers the BTU content value. Sulfur in coal results in sulfur di- 
oxide emission that pollutes the environment. Water particles in the coal 
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absorb heat to evaporate and then superheat. The customers specify the 
quality parameters (maximum percentage of ash and sulfur, and minimum 
BTU/pound) for their coals. 

The Coal Company considered in the present research currently operates 
three mines. These mines differ greatly in their cost of production and coal 
quality. Mine-3 is a relatively low cost mine, but its coal contains high sulfur 
and does not have satisfactory metallurgical properties. On the other hand, 
it contains reasonably low ash. Mine-1 is the highest cost mine, and the coal 
contains relatively high ash (stone) and medium sulfur but it has excellent 
metallurgical properties. Mine-2 is the largest and lowest cost mine. Its coal 
contains higher ash and sulfur than mine-1 coal, but it has good metallurgical 
properties. Because of the coal properties, only mine-1 and mine-2 coals are 
used in the preparation of metallurgical coal. 

Preparation and blending are the two coal upgrading and processing facili- 
ties. Coal preparation (washing) is a process of removing physical impurities. 
The process involves several different operations, including crushing (to cre- 
ate a size distribution), screening (to separate sizes) and separators (mainly 
cyclones, to remove the physical impurities). The objective of running a coal 
preparation plant is to maximize the revenue from clean coal while removing 
the undesirable impurities. 

The processing of ROM coal from mine-3 in the preparation plant does not 
improve the quality of coal with a reasonable yield. Therefore, the involve- 
ment of the preparation plant with this low quality ROM coal means a lower 
financial performance for the company. The customers do not accept these 
high sulfur coals for their plant operations because of environmental pollution 
restriction. The conversion of low quality ROM coals to a minimum accept- 
able quality level will mean a better financial performance for the company. 
A blending process provides an opportunity of quality improvement. 

Blending is a common process in the Coal Industry. Blending allows up- 
grading the low quality run of mine coals by mixing with good quality 
coals. Furthermore, supplying the good quality ROM coals to the customers, 
through blending, can reduce the cost of production, because it saves (i) the 
cost of washing and (ii) lost BTU from the refuses of the preparation plant, 
and (iii) it also eliminates the need for capital investment in washing facil- 
ities. Most of the thermal coal customers accept blended products if they 
satisfy their quality requirements. 

In the blending process, the problem is to determine the quantity required 
from each run-of-mine coals and washed products that maximizes the revenue 
but satisfies the quality constraints of the customers. 

A single period coal-blending problem can be formulated as a simple 
linear programming model ([Gershon, 1986], [Hooban and Camozzo, 1981], 
Bott and Badiozamani, 1982], [Gunn, 1988], [Gunn et al., 1989], [Gunn and 
Chwialkowska, 1989], and [Gunn and Rutherford, 1990]). For the multiperiod 
case ([Sarker, 1990], [Sarker, 1991], [Sarker and Gunn, 1990], [Sarker, 1994], 
Sarker and Gunn, 1991], [Sarker and Gunn, 1997], [Sarker and Gunn, 1995], 
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(Sarker and Gunn, 1994], and [Sarker, 2003]), the modeling process depends 
on the decision whether to carry inventory of run-of-mine (ROM) or of 
blended coal (final product). The multiperiod coal-blending problem with 
inventory of blended coal is a nonlinear program. On the other hand, the 
multiperiod blending problem with inventory of ROM can be formulated as 
a linear program. In this case, a number of alternative LP models can be 
developed allowing the use of ROM inventory in n future periods. 

A large-scale LP is solvable using any of the standard LP packages. How- 
ever, a large-scale nonlinear program is complex and is not easy to solve. The 
current model is an especially structured nonlinear program, and is solved 
using a simple SLP (Successive Linear Programming) algorithm developed by 
(Sarker and Gunn, 1997]. The solutions of some multiperiod LP models are 
not practically feasible for several technical reasons. The quality of solutions, 
complexity of formulation and solution approaches, and solution implemen- 
tation difficulties for these models are compared and analyzed. A choice of 
the most appropriate model is suggested. 

The chapter is organized as follows. Following the introduction, we discuss 
four alternative models for coal blending and unpgradation. The flexibility 
of these models ares analyzed in Section 3. Section 4 discusses the problem 
sizes and computational time required. The objective function values and the 
nature of fluctuating situations for the test problems are presented in Section 
5. The selection criteria for choosing the most appropriate model is discussed 
in Section 6 and the conclusions are drawn in Section 7. 


21.2 Mathematical programming models 


The single period LP model is formulated to determine an optimal strategy 
for coal blending, washing and customer allocation so as to transform the 
available run of mine coal into products within customer market specifica- 
tions at maximum overall profit. The constraints considered in this model 
are the maximum and minimum allowable limits of ash, sulfur and BTU con- 
tent, production limits, demand requirements, etc. In the multiperiod case, 
the objective function and constraint types are similar to the single period 
model. However, the inventory of ROM and/or blended product is used as 
the linking mechanism from one period to the next. The problem formulation 
when considering inventory of ROM becomes a linear program, whereas with 
inventory of blended product it becomes a nonlinear program. Any ROM 
extracted in period t can be used in the blending process in any or all future 
periods. This assumption controls the size of LP in the multiperiod formula- 
tion. The planning horizon considered is 12 months, in 1-month-long periods. 
We consider four different models in this chapter. These models are defined 
as follows: 


e SPM: Single period model. 
e MNM: Multiperiod nonlinear model. 
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e MLM: Multiperiod linear model. 
e ULM: Upper bound linear model. 


To give an idea about the mathematical models of coal blending and 
upgradation, we will discuss the above four models briefly in this section. 
All the models consider M mines, NS local customers, K washed coals cus- 
tomers, CT thermal coal customers (by (metric) tonne basis) and C'B thermal 
coal customers (by BTU content basis). The company has both coal wash- 
ing/upgrading and coal blending facilities. The blending process accepts both 
ROMs and washed coal to produce blended coals. The objectives of these 
models are to find appropriate production plans to satisfy the customers’ 
demand for a given number of periods by satisfying the following constraints: 


e Demand constraints for all types of customers (maximum and minimum 
requirements are known in advance) 

e Mine production capacity (known maximum and minimum capacity) 

e Wash plant capacity constraints 

e Quality constraints such as allowable upper and lower limit of percentages 
of ash, sulfur and BTU content per pound, and 

e Overall sulfur emission constraint for environmental control 


21.2.1 Single period model (SPM) 


The details of the SPM are presented below for the readers. SPM considers 
a period of 1 month long. 


Variables 
bp (metric) tonnes of blended product for customer j made at 
location | 
Cmjl (metric) tonnes of run-of-mine coal from mine m used for blended 
product j at location 1 
WCk (metric) tonnes of washed product k produced 
worst (metric) tonnes of washed product k used for blended product j 
at location | 
mbzj1 (metric) tonnes of middling product k used for blended product 
j at location 1 
WDke (metric) tonnes of washed product k sent to customer c 
Data 
J number of blended product customers 
L(j) set of sites used for blended product for customer 7 
G78; BL, run-of-mine ash, sulfur and BTU/Ib analysis for 
mine m 
ay, sf, Be as received ash, sulfur and BTU/Ib analysis for 


washed product k 
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m m m 
ay, sy’, By 


BTU; , BTU; 


Sis 
NS 


Gi 


Data (continued) 


as received ash, sulfur and BTU/Ib analysis for 
middling product k 

maximum allowable ash and sulfur analysis for 
customer 7 

minimum allowable ash, sulfur and BTU/Ib analysis 

for customer j 

maximum and minimum BTU requirements for 
customer 7 

maximum allowable sulfur supplied to local customers 
set of blended product customers who correspond to 
local customers 

amount of SO, per (metric) tonne of sulfur supplied to 
customer j € NS 

number of mines 

amount of run-of-mine coal from mine m used per 
(metric) tonne of washed product k (this corresponds to 
the washed product recipe) 

ratio of middling in product k produced 

maximum and minimum production from mine m 
blending cost (dollar/(metric) tonne) for customer j at 
location | 
mining cost (dollar/(metric) tonne) for mine m 

BTU content (million BTU/(metric) tonne) for ROM 
coal from mine m 
BTU content (million BTU/(metric) tonne) for washed 
product k 
BTU content (million BTU/(metric) tonne) for 
middlings of washed product k 

price (dollar/million BTU) offered by the blended 
customer 7 

price (dollar/(metric) tonne) offered by customer c for 
washed product k 

transportation cost (dollar/(metric) tonne) to blended 
product customer j from blending location | 
transportation cost (dollar/(metric) tonne) from mine 
m to blending location 1 
transportation cost (dollar/(metric) tonne) from mine m 
to VJ plant 
transportation cost (dollar/(metric) tonne) from VJ 
plant to blending location | 
transportation cost (dollar/(metric) tonne) from VJ 
plant to washed customer c (may include banking and 
pier costs) 
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The objective function to be maximized, profit, is 


Z=S°S [|-(BCOST;: + TCBC):) x bp;l] 
ee | 


m J I 


Constraints 


+S°S0[PPre — TCWC] x wpe 
k c 


+S°S 0S C1BM x PB; —TCML 3, — MPROp] x ems 

+50 [-We- So ame X (MCMW yy, + MPRO,)] x weg 
k m 

+SOS0S (BY? x PB) — TCWL) x whys 
k J I 


fe Se ete x PB; —TCWL\] x mbg51 
EG 
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The constraints of SPM are presented below. Relations (21.1)—(21.8) all hold 


for j = 1, J, 1€ L(j). 


1. Mass balance for blended products: 


— bpyi + ps Cmjl + SS wort + S- mbxj1 = 0 
m k k 


2. Ash limits in blended product 


s: 


+ w m 
— 4; bpjt + y OyyCmyl + ) ap whet + ) ap mbzjt < 0 
m k k 


a=, c Ww 
— a; bpji+ s AmCmjl + s ay Worst 4 
m k 


3. Sulfur limits in blended products: 


+ ra w 
— 8; bpj + ) SmCmjl + S 8, Whejl 4 
m k 


neal Cc Ww 
— 8; bps + s SCmyl + s Sp Whys1 4 
m 


k 


- ‘D ap ME; > 0 
k 


- oS sp Mbp;I < 0 
k 


e a sp Mbzg;1 = 0 
k 


4. Minimum BTU content in blended products: 


— B; bpjt + x BrCmjl + x By woyjt + se By mbrji >0 


m 


k 


k 


(21.1) 


(21.2) 


(21.3) 


(21.4) 


(21.5) 


(21.6) 
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5. Overall BTU supply to customers: 


BTU; S oe oS Bremgt + :S By whys + So By mby51 
U Te k k (21.7) 


for r=+,-— 


6. Overall sulfur supplied to NSPC plants: 


» G Bs SinCmgt + >, Se wou gt + >> st Site.  (2h8) 
m k k 


JENS 


7. Maximum and minimum mine production: 
MP SN Gea See eg MPR Gand (21.9) 
j ol k 
8. Mass balance in washplant: 


wez— >_> wig — > wPre = 0 k=1,K (21.10) 
j l c 


9. Middlings ratio for washed products: 


reduc, _ So SS mbes —-0 k= 1k (21.11) 
je a 


10. nonnegativity constraints. 


We compare this model with the multiperiod model by considering a col- 
lection of 12 single period models. 


21.2.2 Multiperiod nonlinear model (MNM) 


This model considers 12 periods where each period is 1 month long. The 
variables of MNM are similar to those of SPM with an additional subscript t 
to represent time period. To differentiate from SPM we use capital letters for 
variables. Although the constraints of this model in each period are similar 
to SPM, it has additional constraints to link one period to the next for the 
entire planning horizon. The model allows the inventory of blended prod- 
uct to be carried from one period to the next. The inventory variables and 
inventory balance constraints maintain the links in this multiperiod model. 
However, the quality (percentage of ash, sulfur and BTU content per pound) 
parameters of blended coal inventories carried from one period to next are 
unknown which introduce nonlinearity in the model. Although the details of 
the mathematical model for MNM can be found in [Sarker and Gunn, 1997], 
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the mass balance constraint is presented below to give an idea about the 
nature of variables and constraints in MNM: 


— BPjtt — Tye + Tyna + > Omi + S> WBaje + 5) MBaju =0  V5,Lt 


m k k 
(21.12) 
where 
BPyu (metric) tonnes of blended product j, supplied to customer 
j, made at location | in period t (jth product 
corresponds to jth customer) 
Cmilt (metric) tonnes of run-of-mine coal from mine m used for 
blended product j at location / in period t 
W Buje (metric) tonnes of washed product k used for blended 
product j at location / in a period t 
M Batt (metric) tonnes of middling product k used for blended 


product j at location / in period t 
Ty inventory of blended product j at location I 
at the end of period t 


The above constraint indicates that the total amount of blended product 
j (produced for customers and inventory) is equal to the sum of its con- 
stituents of ROM coals, washed coals, middling products and blended coals 
from inventories. 


21.2.3 Upper bound linear model (MLM) 


The MLM is similar to MNM except that it allows the transfer of the inven- 
tory of ROM from one period to any or all future periods within the planning 
horizon. This model forms the upper bound of the problem since it consid- 
ers all possible savings from inventories and productions. The details of the 
model can be found in [Sarker, 2003]. 

This model allows carrying the most attractive input(s) in terms of quality 
and cost, for future periods. This is the upper bound of the planning problem 
because: 


1. this model considers all possible alternatives of supplying coals to cus- 
tomers and 

2. the solution to this model will give an objective value larger than or equal 
to any feasible solution to the ’true problem.” 


To have a feeling about the nature of variables and constraints in ULM, 
we present the mass balance constraint below: 


— BPin + SS > Cmgrre + DS WBrjire + DY MBujirt =0 (21.13) 


m T<t ko r<t k T<t 
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where 


Cmilrt ROM coal of mine m produced in period 7, 

used in blended product j at location / in a period t 
W Brjirt washed product k produced in period 7, 

used in blended product j at location / in a period t 
M Byyirt middling product k produced in period 7, 

used in blended product j at location / in a period t 


The constraint represents that the total amount of blended product j 
(produced for customers only) is equal to the sum of its constituents of ROM 
coals, washed coals and middling products taken from current and previous 
periods. 


21.2.4 Multiperiod linear model (MLM) 


The MLM is similar to ULM except that it only permits the carrying of 
inventory of ROM coals from one period to the next where the quality pa- 
rameters of ROM coals are known. That means, the run-of-mine and washed 
coals produced in period t (=r) will be carried for further use in period t+1 
only. So the corresponding mass balance constraint will be as follows: 


— BP + 9° Cmgue—e + >, WBaje—ye + >) MBgjye—rye = 9 (21.14) 
m k k 


where 
Crjl(t—-1t ROM coal of mine m produced in period (¢ — 1), 
used in blended product j at location / in a period t 
W Bajue—1t washed product k produced in period (¢ — 1), 
used in blended product j at location / in a period t 
M Byyut—1)t middling product & produced in period (t — 1), 
used in blended product j at location / in a period t 


A number of new models can be formulated between MLM and ULM by 
varying n (the inventory of ROM and washed coals which can be carried from 
one period to up to n period, where the maximum value of n is 11 in our 12 pe- 
riod case). Please note that we intentionally ignore the mathematical details 
of MNM, ULM and MLM in this chapter, as they are too long and the empha- 
sis of the chapter is on comparisons, and refer [Sarker and Gunn, 1997] and 
(Sarker, 2003] to interested readers. Alternatively they can be made available 
by the author upon request. 

These models differ in their capability of handling fluctuating situations, 
the computational time required, the size of the problem, optimal objective 
function values, number of coal banks required, etc. By a fluctuating situation 
we mean a variable planning environment. These aspects are discussed in the 
following sections. 
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SPM, ULM and MLM have been solved using the XMP linear program- 
ming codes. The MNM is a specially structured nonlinear program that is 
solved using a simple SLP algorithm developed by [Sarker and Gunn, 1997]. 


21.3 Model flexibility 


The SPM is the least flexible model and ULM is the most flexible model. The 
MNM is less flexible in choosing the inputs in comparison to MLM, but more 
flexible in handling fluctuating situations. In our computational experience, 
the objective function value of MNM is a little less than that of MLM when 
there is a stable demand and production pattern. This is due to the fact that 
the blended product inventory may not be an attractive input in the next 
period. In the following, we discuss how the models work under fluctuating 
demand and inputs. 

Consider the following simplified situations for a three period problem: 
Let X,, X2 and X3 be the maximum level of inputs available in periods 1, 
2 and 3, and Qj, Q2 and Q3 are the respective demands in periods 1, 2 and 3. 


Case-1 X4+X2t+X3 = Qi+Q2t+Q3, Q2 = X2, Qi < X1 and Q3 > X3 


Case-2 X1+X9+X3 = Q1+Q2+Q3s, Q3 = Xs, Qi < X1 and Q2 > X2 


Case-3 X1+X2+X3 = Q14+Q2+Q3, Qi < X1, Q2 > X2 and Q3 > X3 


The simple line diagrams for these three cases are shown in parts (a)—(e) 
of Figure 21.1. The models treat each of the cases as follows: 


21.3.1 Case-1 


SPM (Figure 21.1a): 


Q, and Q2 are satisfied, but Q3 is not satisfied 
Shortage in period 3 = Q3 — X3 

Unused capacity in period 1 = X; — Q; 

This model does not provide a feasible solution 


MNM (Figure 21.1b): 


Qi, Q2 and Q3 are satisfied 

IQ < X1 — Qi, [Q2 = IQ and [Q2 = Q3 — X3 
Unused capacity in period 1 = (X, + X3) — (Qi + Q3) 
The model does provide a feasible solution 
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Fig. 21.1 Simple case problem. 
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MLM (Figure 21.1c): 


Q, and Q2 are satisfied, but Q3 is not satisfied 
IX, = 0, [Xz =0 

Shortage in period 3 = Q3 — X3 

Unused capacity in period 1 = X; — Q, 

This model does not provide a feasible solution 


ULM (Figure 21.1d): 


Q1, Q2 and Q3 are satisfied 

IX, = 0, [Xp =0 

IX13 < X1 — Qi, [X13 = Q3 — X3 

Unused capacity in period 1 = (X; + X3) — (Qi + Qs) 
The model does provide a feasible solution 


Variant of MNM (Figure 21.1le): 


Q, and Q2 are satisfied, but Q3 is not satisfied 
IQ) = 0, 1Q2 = 0 

Shortage in period 3 = Q3 — X3 

Unused capacity in period 1 = X; — Q, 

This model does not provide a feasible solution 


21.3.2 Case-2 


SPM (Figure 21.1a): 


e Q, and Q3 are satisfied, but Q2 is not satisfied 
e Shortage in period 2 = Qo — Xo 
e The model does not provide a feasible solution 


MNM (Figure 21.1b): 


e Q1, Q2 and Q3 are satisfied 

e 1Q) < X1 —Q1, 1Q2 = 0 and 1Q, = Q2 — X2 
e The model does provide a feasible solution 
MLM (Figure 21.1c): 

e Qi, Q2 and Q3 are satisfied 

e IX, < X1 —Q, [Xo = 0 and 1X, = Qo — Xo 
e The model provides a feasible solution 


ULM (Figure 21.1d): 


Qi, Q2 and Q3 are satisfied 
IX43 = 0, IX2 =0 
IX, < Xy = Qi, IX == Qo —+Xo9 


e 
e 
e 
e The model provides a feasible solution 
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Variant of MNM (Figure 21.1e): 


e Qi, Q2 and Q3 are satisfied 
e 1Q1 < X%1— Qi, 1Q2 = 0 and [Q1 = Q2 — X2 
e The model provides a feasible solution 


21.3.3 Case-3 


Only MNM and ULM provide feasible solutions. 

The MNM and ULM provide feasible solutions for all of the three cases. 
The MLM and variant of MNM give feasible solutions for case 2 only, and 
SPM does not provide any feasible solution for any of the three cases. 


21.4 Problem size and computation time 


The ULM is a simple but large linear program. We can solve a reasonably 
large linear program without much difficulty. The MLM model is also a linear 
program. It is smaller than the upper bounding model. The MNM is smaller 
than the ULM and close to MLM, but it takes the largest computational time. 
In our study, we solved 36 test problems. In the test problems, we considered 
the number of blended products up to 2, blending locations up to 3, number 
of mines up to 3, coal washing facilities up to 3 and the time periods 3 to 12. 
The arbitrary demand, capacity and quality data were randomly generated 
for different test problems. The ranges for some monthly data are: blended 
product demand, 200,000 to 250,000 (metric) tonnes, washed coal demand, 
290,000 to 550,000 (metric) tonnes, production capacity of mine-1, 85,000 
to 270,000 (metric) tonnes, capacity of mine-2, 95,000 to 300,000 (metric) 
tonnes and capacity of mine-3, 70,000 to 240,000 (metric) tonnes. The relative 
problem sizes of the models are shown in Table 21.1. 


Table 21.1 Relative problem size of ULM, MLM and MNM 


Minimum Problem Size Mazimum Problem Size 

Number of Number of Number of Number of 

Model Constraints Variables Constraints Variables 
ULM 66 49 576 4770 
MLM 66 44 576 1800 

MNM 75 46+9 792 14944216 


For the largest problem, the number of variables in MNM is (1494+216) = 
1710. Out of these 1710 variables, 216 variables are additional variables, which 
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are required to solve the model using a SLP algorithm [Sarker and Gunn, 1997]. 
The ULM and MLM have a similar number of constraints and the MLM has 
many fewer variables. For the largest problem, the ULM model contains 576 
constraints and 4770 variables, whereas the MLM contains 576 constraints 
and 1800 variables. With an increasing number of blended products, blending 
locations, washed products, inputs and customers, the ULM becomes a very 
large problem in comparison to the other two models. 

The ULM is a possible candidate for the planning problem under consid- 
eration. This model could be a very large linear program with a large number 
of blended products, washed products, customers, mines and blending loca- 
tions. If the problem size is too large, one could consider the following points 
to reduce the size of the problem without losing the characteristics of the 
model. 


1. Reduce the number of periods in the planning horizon. Consider 5 periods 
(3 of one month, 1 of three months and 1 of six months) instead of 12 
periods. 

2. Group the customers based on similar prices offered and transportation 
costs. 

3. Reduce or group the number of products based on similar quality param- 
eters. 

4. Omit the quality constraints for blended product. 


These considerations will provide a solution in more aggregate form. More 
aggregate means more problem in disaggregation under an unstable planning 
environment. In such a case, the reformulation of the disaggregate model may 
be necessary to obtain a detailed and practically feasible solution. 

Though we ignore the physical inventory in the upper bounding model, the 
model does not require extensive inventory carry through. We have examined 
closely the blending processes considered in modeling MNM and MLM. The 
way of dealing with inventory is one of the major differences between these 
two models. We allow the inventory of blended product in MNM and inven- 
tory of run-of-mine and washed coals in MLM. In both cases, the inventories 
were carried for use in the next period. However, the inventory of blended 
product (in MNM) of a period can be used in further future periods through 
blending in the intermediate periods. 


21.5 Objective function values and fluctuating situation 


The SPM, considered in this chapter, is a collection of 12 single period models 
without having linking mechanism from one period to the next. This means 
the model does not allow carrying of any inventory. This model gives a lower 
bound of the planning problem for profit maximization and cannot be applied 
in fluctuating cases. The ULM is a Land algorithm or transportation type 
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model. This model considers all possible savings from the carrying inventories 
of inputs for the blending process. This model gives an upper bound of the 
problem. The computational comparisons of these models are presented in 
Table 21.2. 


Table 21.2 Objective function values of ULM, MLM and MNM 


Objective Value of Smallest Objective Value of Largest 
Model Problem (Million Dollars) Problem (Million Dollars) 
ULM 27.817 117.852 
MLM 27.778 117.327 
MNM 27.771 116.011 


SMP is infeasible for both cases. Normally the MLM shows lower profit 
than that of the ULM and higher than that of the SPM. However this model 
shows infeasibility in highly fluctuating situations. The MNM also shows 
lower profit than that of the ULM, higher than that of the SPM and close 
to that of MLM for most cases. The MNM can handle a highly fluctuating 
situation as well as the ULM. 


21.6 Selection criteria 


The above analysis is used to suggest an appropriate model for the planning 
problem. The ULM model can be proposed for use as a tactical model. The 
analysis shows that 


1. the ULM ensures the highest profit, 

2. it takes less computational time than does the nonlinear model, 

3. it provides flexibility to choose inputs and tackle fluctuating situations, 
and 

4, for practicality the number of banks (coal piles) suggested by the model 
are much lower than that predicted by theoretical calculations. 


The selection of ULM as a planning model may raise a question as to 
the use of MNM. The MNM is an alternative formulation approach for the 
planning issues which follows the concept of traditional formulation of mul- 
tiperiod planning. This model also allows us to check the solution of ULM or 
MLM. The solutions of these models are reasonably close as expected. One 
may prefer to carry an inventory of blended product instead of run-of-mine 
and washed coals, and take the advantage of managing fewer banks using 
the solution of MNM. The development of the solution method for MNM 
has given us that opportunity. The use of available LP codes in solving a 
large nonlinear program makes the algorithm very attractive for practical 
applications. The algorithm can also be used for solving other multiperiod 
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blending problems like food or crude oil blending. Since the algorithm devel- 
oped for the MNM can be generalized for a class of nonlinear programs, its 
contribution to the knowledge is justified. 


21.7 Conclusions 


We addressed a real-world coal-blending problem. The coal-blending problem 
can be modeled in a number of different ways depending on the portion of 
reality to be excluded. The alternative mathematical programming models 
under different scenarios are discussed and analyzed. 

A coal-blending model has been selected from a number of alternative 
models by comparing: 


the computational complexity, 

the model flexibility, 

the number of banks required, and 
the objective function value. 


cea eae ae a 


The upper bound linear programming model seems appropriate because 


1. it shows the highest profit, 

2. it takes less computational time than does the nonlinear model, 

3. it is the most flexible model in tackling an unstable planning environment, 

4. for practicality, the number of banks suggested by the model is much lower 
than that predicted by theoretical calculations, and 

5. it is implementable. 
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